Journal of Applied Psychology 


Edited by Donald G. Paterson, University of Minnesota 
Consulting Editors 


George K. Bennett, Psychological Corporation 
Walter V. Bingham, Washington, D. C. 
Harold E. Burtt, Ohio State University 
Allen L. Edwards, University of Washington 
Clifford E. Jurgensen, Minneapolis Gas Co. 
Irving Lorge, T. C. Columbia University 
Quinn McNemar, Stanford University 


Alexander Mintz, City College of New York 
James P. Porter, Danville, Illinois 
Julian B. Rotter, Ohio Staie University 
Edward K. Strong, Jr., Stanford University 
Donald E. Super, T. C. Columbia University 
Morris S. Viteles, University of Pennsylvania 
Alfred C. Welch, Knox-Reeves, Minneapolis 





Table of Contents 
The Adequacy of Employee Selection Reports: M. H. Jones 


o-, gaya of an Abbreviated Point Job Evaluation System: M. K. Davis and J. 


225 


Age and Route Sales Efficiency: C. B. Cover and S. L. Pressey 

Ortho-Rater Norms and Sex Differences: J. H. Ely, N. C. Kephart, and J. Tiffin 
Fluorescent Light Versus Daylight: J. S. Gray and P. Prevetta 

eee in the Predictive Value of a Battery of Tests: R. M. W. Travers and W. L. 


Intercorrelations in Merit Rating Traits: C. E. J 
How Readable Are Corporate Annual Reports? 


nen 
S. Pashalian and W. J. E. Crissy 


4 Combined Oral Reading and A tay aloanic Response Technique for Investigating 


Certain Reading Abilities of Co 


lege Students: H. L. J. Carter 267 


Geographical Sampling in Testing the Appeal of Radio Broadcasts: J. G. Peatman and 


T. Hallonquist 


270 


The Effect of Color in Direct Mail Advertising: J. W. Dunlap 
Brand Discrimination Among Cigarette Smokers: C. K. Ramond, L. H. Rachal, and 


M. R. Marks 


282 





American Psychological Association 


Vol. 34, No. 4 


August, 1950 





Journal of Applied Psychology 


Published Bi-monthly by the American Psychological Association, Inc. 
Prince and Lemon Sts., Lancaster, Pa. 


Annual subscription, $6.00; single copies, $1.25 


Subscriptions and business communications should be sent to 
American Psychological Association 
1515 Massachusetts Avenue N.W. 
Washington 5, D. C. 


Articles for publication and books for review should be sent to the Editor 


Professor Donald G. Paterson, Department of Psychology 
University of Minnesota, Minneapolis 14, Minnesota 





This Journal gives prompt consideration to 
manuscripts reporting original investigations in 
any field of applied psychology except clinical 
and consulting psychology. A descriptive or 
theoretical article is occasionally accepted if it 
deals in a distinctive manner with a problem of 
applied psychology. The policy is, however, to 
favor papers dealing with quantitative investi- 
gations of direct value to psychologists working 
in the following fields: Vocational diagnosis and 
occupational guidance; educational diagnosis, 
prediction and guidance at the secondary school 
level and higher; personnel selection, training, 
placement, transfer and promotion in business, 
industry and government service including the 
armed forces; supervisory training in business, 
industry and government; bio-mechanics or de- 
sign of machines to fit the human operator; il- 
lumination, ventilation and fatigue in industry ; 
job analysis, description, classification and eval- 
uation; measurement of morale of executives, 
supervisors, or employees; surveys of opinion on 
social or political issues, such as those conducted 
by The Psychological Corporation ; psychological 
problems in market research and in advertising. 


Articles may be under 500 words. The maxi- 
mum is 12,000 words, the average in the 


neighborhood of 4,000 words. To reduce lag of 
publication, adherence to the rule of “brevity 


consistent with clarity” is encouraged. 


A lapse of six to twelve months occurs between 
acceptance of an article and its publication, the 
lag varying with the rate at which manuscripts 
are submitted. If, however, an author is pre- 
pared to defray the costs of printing the neces- 
sary extra pages, he may arrange for earlier 
publication without thereby postponing the ap- 
pearance of manuscripts by other contributors. 
This enables the management to provide space in 
addition to the scheduled 64 pages per issue. 
“Early publication” is thus a direct contribution 
to the subscribers. By cutting down lag in pub- 
lication, it also benefits those authors whose 
articles are published in regular turn. 


Tables, footnotes and references as well as 
text of manuscripts should be typed double-spaced 
throughout. Authors should adhere to the con- 
ventions described by J. E. Anderson and W. 
L. Valentine in “The preparation of articles for 
publication in the journals of the American 
Psychological Association,” Psychol. Bull., 1944, 
41, 345-376. A reprint of this article will be 
loaned to any prospective contributor who does 
not find it in his library. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 
Acceptance for mailing at the special rate of postage provided for in paragraph (d-2), Section 34.40, 


P. L. & R. of 1948, 


October 10, 1947 


Copyright, 1950, by The American Psychological Association, Inc. 





Journal of Applied Psychology 








VoL. 34, No. 4 


AuGust, 1950 








The Adequacy of Employee Selection Reports 


Margaret Hubbard Jones 
University of California at Los Angeles 


An examination of well over 2,100 references 
from the obtainable world literature on em- 
ployee selection has permitted an analysis of 
practices in both experimental design and re- 
port, the results of which are perhaps surprising. 
These references cover the period 1906 to 1948, 
within the ability of the author to locate the 
references and within the ability of reference 
librarians to locate copies of unusual and for- 
eign periodicals and monographs. There are 
further limitations to the data to be analyzed 
here which were imposed by the primary 
purpose of the literature search. That aim was 
the compilation of abstracts of employee selec- 
tion reports which should contain the actual 
data presented, together with sufficient infor- 
mation to enable the reader to evaluate the 
study without referring to the original report. 
The work was prompted by the difficulty ex- 
perienced in this particular field in locating 
the widely scattered references (we had refer- 
ence to more than 300 separately-titled peri- 
odicals—and many volumes of each one-—as 
well as books and monographs) and by the 
fact that most industrial psychologists do not 
have the time or facilities necessary to review 
this literature. These abstracts appear else- 
where (2). 

Due to the large volume of material in this 
field and to the fact that many articles which 
seem important contain virtually no informa- 
tion, it did not seem economic to abstract all 
possible references, and the survey is thus 
limited to those studies which can be evaluated: 
those in which relatively complete validation 
data are presented, together with specific tests 
used, N and job studied. Further, since we 
were interested in selection of employees for 
industrial concerns we have also excluded the 
special fields of selection for the armed forces 


219 


and pilot training as posing special problems. 
It has by now become abundantly obvious that 
seemingly slight changes in working conditions, 
incentives and parent population, to mention 
the more obvious factors, may result in the 
failure of even a carefully executed selection 
program. In view of this, it seemed wiser to 
exclude those studies in which the criterion 
was school grades or teachers’ ratings, and 
those carried out on military personnel even 
where the jobs are similar to civilian jobs, be- 
cause of the differences in motivation and 
working conditions. After we have eliminated 
the reports which are, by our definition, special 
problems and those which are so inadequately 
presented or executed that they cannot even be 
evaluated by the reader (a large proportion of 
the total) there remain 427 reports, or 20% of 
the total number of references. In this analysis 
we shall be concerned entirely with these 427 
reports which can be evaluated. Since many 
articles report results on diverse jobs, very 
often with different tests and different statis- 
tical procedures, we have at times referred to 
the number of separately-treated groups rather 
than to the number of titles and have endeav- 
ored to make the distinction clear wherever the 
former occurs. 

These 427 references are largely American— 
about 80%—both because of the greater avail- 
ability of privately published and unpublished 
material of American origin and because of the 
larger total volume of articles. It does not ap- 
pear that the percentage of acceptable articles 
is much larger for American work than for that 
of any other country. 

The volume of acceptable articles is shown 
by year in Figure 1. The slow rise to a peak 
after World War I, followed by a decline 
through the depression era is not unexpected. 


ei a erie Me 
LR EE STORER EN a OE, 


ne vere Sm 


TN a lati weet 


ne re ee 








220 


Whether this can be entirely attributed to over- 
selling of testing, as is usually done, or whether 
it does not also reflect to some extent the gen- 
eral decline in business activity is a debatable 
point. The annual volume of articles reached 
a high again in 1941, as business was recover- 
ing, fell off, understandably, during the war 
years, and has now reached an all-time high. 
Whether it will remain high probably depends 
partly on the quality of current work and 
partly on the general level of industrial activity. 

The ten jobs which have been most fre- 
quently studied are as follows: Salesmen, 75; 
Clerical Workers, 60; Teachers, 49; Assemblers, 
23; Executives, 23; Inspectors, 23; Supervisors, 
21; Typists, 17; Stenographers 14; and Ma- 
chinists, 9. This order does not necessarily 
reflect either the importance of the job or the 
difficulty of selecting good workers. Among 
the jobs which appear to have been acceptably 
reported but once are brick-layers, grocers, 
scientists and deans of women. As can be 
seen, salesmen lead the list. This, of course, 
includes salesmen of all sorts and many of the 
jobs are quite different. The same comment 
applies to the other categories. There is 


4 
i ta 


| ne ee | 





Margaret Hubbard Jones 


usually no way of determining from the pub- 
lished reports whether the jobs in two studies 
are comparable. As Ghiselli has shown, there 
is an astounding range of reported validity 
coefficients for the same general type of test 
in any broad occupational classification (3). 
For clerical occupations he found the range to 
be over .90. Theré are many reasons for this 
state of affairs but one which has perhaps not 
been sufficiently emphasized is the lack of 
adequate job description in published reports. 
It is now fairly generally recognized that a 
selection program is very much situation- 
bound but the corollary, that this requires 
precise job description, is not currently 
practiced. 

Let us now examine in more detail the 427 
reports which represent the cream of the crop. 
The number of subjects used in the investiga- 
tion is an important factor in determining the 
predictive value of the results. Except in the 
majority of cases using less than 20 subjects, 
the N in and of itself does not tell the whole 
story. Much depends upon how the data are 


treated and whether or not the total popula- 
tion of employees on a particular job, or a rep- 





1925 


YEAR 


Fic. 1. 


Number of employee selection reports by 


year, 





The Adequacy of Employee Selection Reports 


resentative sample thereof, was used. Never- 
theless, it is insturctive to analyze the trend in 
this respect. The results of the analysis by 
number of subjects is as follows: Less than 10, 
17; 10-19, 97; 20-29, 93; 30-49, 129; 50-99, 
188; and 100 and above, 257. Even more un- 
expected than the number of groups with small 
N is the number with 50 o® more and the 257 
groups containing 100 or more subjects. The 
latter are by and large the more recent studies 
and the trend is encouraging. 


Statistical Techniques 


An analysis of the statistical techniques used 
for presentation of the results of validation 
procedures is interesting, but again the particu- 
lar statistic used does not guarantee adequacy 
of treatment because the assumptions govern- 
ing its use may not have been met and the 
statistic best suited to a given problem may 
not have been chosen. Table 1 shows the 
frequency with which various measures are 
used. Correlational techniques are the most 
popular, accounting for 285 out of 525 sepa- 
rately-treated groups. Of these only 172 give 
measures of significance, and although they can 
be calculated from the data provided, it is 
safer, considering the heterogeneous nature of 
the audience in this field, to present the stand- 
ard errors along with the coefficients of corre!a- 
tion. Furthermore, the author has the real 
responsibility for complete presentation of all 
the statistics necessary to an interpretation of 
his investigation. Occasionally one even finds 
an author concluding that the correlations re- 
ported have clearly shown a relationship be- 


Table 1 


Statistical Measures Used for Validation 


Measure of 
Correlation 


r 136 
Rho 94 
R 35 
This 7 
tetrachoric 
other 
Total 

Group Comparison 

Inadequate Treatment 


Number of Groups 
So Treated 





221 


tween test scores and criterion whereas actual 
calculation shows the correlations to be not 
significantly d'fferent from zero. This practice 
is general and no single individual should 
shoulder the blame for it. 

Group comparisons of various sorts account 
for 185 cases, but of these only 28 include 
measures of the significance of group differences 
(although sometimes such measures could be 
calculated by the reader). Group comparisons 
may take such forms as: differences in mean 
test scores between the upper and lower 50% 
of employees as judged by the criterion, or 
average test scores for groups judged best, 
average and poorest by their supervisors (many 
times without N or sigma for each group being 
indicated), or per cent of those scoring within 
certain limits who were judged good as against 
the per cent who were judged poor, etc. Only 
occasionally are critical ratios or ratios or 
similar measures included. The importance of 
testing results for significance cannot be over- 
emphasized. In the case of group comparisons 
it is more serious than where correlation is 
used because in most cases there are not suffi- 
cient data to enable the reader to perform the 
proper calculations for himself. In one partic- 
ular case, where the author concluded that 
his tests were efficacious for selection but 
neglected to supply any measure of the signifi- 
cance of the differences found between groups, 
calculation of the significance of percentage dif- 
ferences (the only data available) showed the 
differences to be exceedingly insignificant. 

In 55 cases there is incomplete statistical 
analysis. In a few cases the raw data are pre- 
sented with no summary statistics. In many 
instances we find the results expressed only as 
“per cent agreement” between test scores and 
criterion scores, or a brief statement that a 
critical score of a certain magnitude would 
have eliminated a given per cent of the poor 
group and ordinarily a smaller per cent— 
although we do not know that it is a reliably 
smaller per cent—of the good group. In 5 
cases the authors are content to present graphs 
alone, sometimes with the differences very 
much exaggerated by the scale and baseline 
chosen. 

One gains the impression that many times, 
even where adequate statistics are used, the 
basic requirements for their use have not been 














222 


met.' One should be able to assume, for ex- 
ample, that when an r is reported the conditions 
for its proper use have been met, but in view of 
the inadequacy of many of the statistical treat- 
ments one cannot always so assume. A more 
obvious criticism of many studies is the manner 
in which subjects are selected. In spite of the 
fact that the assumptions underlying many of 
the statistics used require a reasonably random 
sample, biased rather than random sampling 
seems to be the rule. It is a common pro- 
cedure to select certain employees to serve as 
subjects but rarely are we given any informa- 
tion which indicates that the sample was a 
selected one or how it was selected. A frequent 
practice is the artificial creation of a hetero- 
geneous experimental group by the use of only 
extreme employee groups (the upper and lower 
25%, for example), a practice which may 
spuriously raise the validity coefficients. 

A point too often overlooked is that a selec- 
tion program is intended to select among 
applicants, not among employees, and the two 
groups are not identical (cf. 5,7, 10). ‘“Natu- 


ral selection” on the job—the survival of the 
fittest—operates to make the employee group 
more homogeneous than the applicant group. 
This may spuriously lower validity coefficients 


and change critical scores. Further, the em- 
ployee group will often not show a normal dis- 
tribution in a trait which is highly correlated 
with ability to produce on the job and since 
in most industrial situations it will be im- 
possible to correct for this error, the usefulness 
of the employee group as a basis for a selection 
program is further limited. The best practical 
solution to both problems—that of bias in 
sampling and that of restriction of range in em- 
ployee groups—seems to be the use of two 
groups: first, a randomly selected employee 
group as a trial group, for reasons of economy, 
and second, an unselected applicant group as 
a follow-up group, to discover whether or not 
the selection program will select among appli- 

' The criticism of the use of statistical methods in 
research on the Rorschach Test by Cronbach (1) may 
be applied in part to employee selection research even 
where other tests are used. Especially to be noted are 
his criticisms of the selection for emphasis of a few 
“significant” differences from among many insignificant 
ones, whether the comparisons are explicitly made or 
merely implied, and his insistence on the use of a second 


independent sample so that chance variations in test 
scores will not be given undue weight. 


Margaret Hubbard Jones 


cants as well as among employees. This solu- 
tion, the use of separate groups, has the further 
advantage of permitting a pragmatic estimate 
of the shrinkage in multiple correlation. This 
is a real advantage. An example is Selover’s 
two samples of clerical workers (N= 193 and 
85, respectively) which yielded multiple cor- 
relations for 4 tests and criterion of .41 and .33, 
respectively (9). An instructive example of 
the danger involved in putting one’s faith in a 
single small sample, particularly when that 
sample has been used to develop a scoring 
procedure, is given by Kurtz (4). Here a 
scoring technique for the Rorschach Test was 
developed which classified correctly 79 out of 
80 sales managers. This was so impressive to 
many people concerned that they were pre- 
pared to start using it as a selection device 
immediately. A follow-up on a second sample 
yielded a validity coefficient of .02! 

The criterion is, of course, a question of ut- 
most importance but we cannot discuss all the 
ramifications of the problem here. Entirely 
aside from the question of the applicability of 
of the criterion as a real measure of job success 
~-the validity of the criterion—we find a prob- 
lem in the reliability of the criterion. Only 
95 reports, or 22% of the 427 acceptable re- 
ports, make some attempt to include measures 
of the reliability of the criterion, and yet it has 
a profound influence upon the results of the 
validation procedures. Of course, low relia- 
bility will not give spuriously high validities— 
rather the opposite—but many studies appear 
to lead to the conclusion that certain tests are 
worthless in a given situation, whereas low 
criterion reliability may actually be masking 
a significant relationship. Surely, if a study is 
worth doing at all, the reliability of the cri- 
terion should be ascertained. 

Another difficulty in connection with the 
criterion is the operation of external influences 
such as age, experience and length of time on 
the job.2. Unless cognizance is taken of these 
variables the results are difficult to interpret, to 
say the least, and few studies control any of 
these factors. For example, it is easy to see 

2 An attempt to compensate for bias in the direction 
of longer service may be found in a recent study by 
Rundquist and Bittner (8), and McMurry and Johnson 


attempted to secure ratings on their subjects after 
approximately equal time on the job (6). 





The Adequacy of Employee Selection Reports 


how age may be predictive of job success if age 
is influencing the criterion either in its own 
right or operating through length of service, 
and yet most studies lump together not only 
all age groups but employees with widely dif- 
fering lengths of service. On the other hand, 
if age or length of service is influencing the 
criterion a significant relationship with test 
scores may be masked. One often suspects a 
further contaminating factor when ratings are 
used if test scores are not kept strictly con- 
fidential until after ratings are made. The 
facts seem to indicate that more attention must 
be paid to proper experimental design if the 
results of selection studies are to be useful. 

How many reports, then, meet all criteria of 
adequacy in both experimental design and 
report? We find that 46 out of the 427 
originally selected contain no second or follow- 
up group but are acceptable in all other re- 
spects, such as sufficiently large N, adequate 
and complete statistical presentation through- 
out,etc. We further find that 17 are adequate 
in all respects except that no measure of the re- 
liability of the criterion is presented. Finally, 
if we count the total number of reports which 
are Satisfactory in all respects we discover 
only eight, or .4% of the 2100 references with 
which we started. These eight studies are as 
follows: 


1. Bellows,R.M. Studies of clerical workers. Chap. 
VIII in Stead, W. H., Shartle, C. L., etal. Occu- 
pational counseling techniques. New York: Ameri- 
can Book Co., 1940, ix + 273, pp. 144-146. 
(Study of coding clerks.) 

. Blum, M., and Candee, B. The selection of de- 
partment store packers and wrappers with the aid 
of certain psychological tests. J. appl. Psychol., 
1941, 25, 76-85. 

. Guilford, J. P., and Comrey, A. L. Prediction of 
proficiency of administrative personnel from per- 
sonal history data. Educ. psychol. Measmt., 1948, 
8, 281-296. 

. Holliday, F. The relation between psychological 
test scores and subsequent proficiency of appren- 
tices in the engineering industry. Occup. Psy- 
chol., Lond., 1943, 17, 168-185. 

. Otis, J. L., Endler, O. L., and Kolbe, L. E. Data 
analysis methods. Chap. VIT in Stead, W. H., 
Shartle, C. L., et al. Occupational counseling tech 
niques. New York: American Book Co., 1940, 
ix + 273, pp. 113-136. (Study of department 
store salespersons.) 

. Rundquist, E. A., and Bittner, R. H. Using 
ratings to validate personnel instruments: a study 
in method. Personnel Psychol., 1948, 1, 163-183. 


7. Sartain, A.Q. Relation between scores on certain 
standard tests and supervisory success in an air- 
craft factory. J. appl. Psychol., 1946, 30, 328- 
332. 

. Selover, R. B. The development and validation 
of a battery of tests for the selection of clerical 
workers. Amer. Psychologist, 1948, 3, 291-292 
(abstract), and personal communication. 


It is not intended to imply that these studies 
found highly predictive test batteries, but 
merely that the technique was adequate. Con- 
clusive negative findings are important and are 
too frequently ignored or even suppressed. 

In conclusion, let me emphasize two points. 
First, the actual work done by industrial 
psychologists is not as bad as would appear 
from this analysis, and the trend is definitely 
toward more complete and careful design and 
execution. In many cases our criticisms apply 
to the reports, not necessarily to the studies 
themselves. More care should be taken in the 
preparation of reports so that all relevant in- 
formation is available to the reader. 


Requirements of a ‘‘Good Report” 


Perhaps a summary of the items one weary 
abstractor would like to see made explicit 
would be in order: 


1. Detailed job description, with each group 
treated separately. 

2. Complete description of the sample: N 
(sufficiently large), what proportion of the total 
population this represents and how selected, 
factors involved in hiring, age, length of time 
on the job (preferably with widely differing em- 
ployees treated as separate groups), and total 
experience in jobs of similar nature; use of two 
samples, one an applicant group. 

3. Exact test titles; when in the employment 
experience the tests were administered ; whether 
the tests were a factor in hiring; where the 
tests were given; under what conditions and 
incentives the tests were given; reliabilities of 
tests with comparable groups. 

4. Detailed description of the criterion; 
length of time on the job when the criterion 
measure was applied (with widely differing em- 
ployees treated as separate groups); reliability 
of the criterion; some discussion of the validity 
of the criterion selected; if ratings are used, 
some estimate of the amount of contact the 
rater has with the employee; if production 





otis be ne ma ead 


224 


records are used, the duration of the period and 
whether there were any unusual factors oper- 
ating at that time. 

5. Adequate statistical treatment, with as- 
surance that the assumptions governing the 
use of the given measures have been met, and 
actual report of the numerical results, together 
with an appropriate measure of significance. 


This may seem like a large order, but many 
adequately executed studies already reported 
could have included most of the items since it 
is obvious from certain remarks that the author 
must have taken them into consideration. In 
view of the untrustworthiness of many reports 
these items should be made explicit. 

A final point concerns those studies done by 
inadequately trained personnel. There are 
many of these and they are quite useless. 
They point to the ultimate desirability of some 
method of identification of properly qualified 
personnel for employee selection programs. 


Summary 


A survey of more than 2,100 references on 
employee selection in industry has revealed 
that only 427 contain sufficient information to 
permit evaluation of the study. These 427 
reports are analyzed in terms of annual volume, 
jobs most frequently investigated, statistics 
used in presentation of validity, number of 
subjects and general adequacy of design. 
This analysis reveals that many of these studies 
are inadequate to permit drawing conclusions 
as to the efficacy of the selection procedures 


Margaret Hubbard Jones 


employed. Factors which influence results 
but are difficult to evaluate from reports as 
they are usually published are discussed. 
Some recommendations for items to be included 
in reports of employee selection programs are 
presented. 


Received October 17, 1949. 


References 


1. Cronbach, L. J. Statistical methods applied to 
Rorschach scores. Psychol. Bull., 1949, 46, 393 
429. 

Dorcus, R. M., and Jones, M. H 
employee selection. New York: 
1950 

Ghiselli, E. E. The validity of commonly employed 
occupational tests. Univ. of Calif. Publ. in 
Psychol., 1949, § (9), 253-288. 

Kurtz, A. K. A research test of the Rorschach 
Test. Personnel Psychol., 1948, 1, 41-51 

MacMillan, M. H., and Rothe, H. F. Additional 
distributions of test scores of industrial em- 
ployees and applicants. J. appl. Psychol., 1948, 
32, 270-274. 

McMurry, R. N., and Johnson, D. L. Develop- 
ment of instruments for selecting and placing 
factory employees. Advanc. Mgmt., 1945, 10, 
113-120. 

. Rothe, H. F. Distribution of test scores of ipdus- 
trial employees and applicants. J. appl. Psy 
chol., 1947, 31, 480-483. 

Rundquist, E. A., and Bittner, R.H. Using ratings 
to validate personnel instruments: a study in 
method. Personnel Psychol., 1948, 1, 163-183 

Selover, R. B. The development and validation 
of a battery of tests for the selection of clerical 
workers. Amer. Psychologist, 1948, 3, 291-292, 
and personal communication. 

. Stromberg, E. L. Testing programs draw better 
applicants. Personnel Psychol., 1948, 1, 21-29. 


Handbook of 
McGraw-Hill, 


SRT TS LET 








Cross Validation of an Abbreviated Point Job Evaluation System 


Milton K. Davis and Joseph Tiffin 


Occupational Research Center, Purdue U niversity 


Job evaluation has become widely used as a 
means of establishing equitable wage rates. 
Although there are many different methods 
available for constructing job evaluation sys- 
tems, the most widely used technique involves 
the use of some type of a point scale. The ap- 
proach used in point systems is to break the 
job into the various component items and as- 
sign points to each of these. The total points 
for all these items represent the evaluated job. 
During recent years there has been an attempt 
to construct scales which use fewer items than 
the longer and more involved ones. 

Previous Studies. Lawshe and his associates 
(3, 4, 5, 6, 7) have published much material 
on such abbreviated scales. Chesler (1) has 
also published research on this subject. The 
procedure used in these studies was to select 
the three or four most important items from 
the longer system by the Wherry-Doolittle 
technique. Abbreviated scales thus derived 
were then compared with the original scales to 
determine the amount of agreement between 
the two. These studies have demonstrated 
that these abbreviated scales yield results 
which are comparable to the original system. 

However, there been some criticism 
against this approach since it was necessary 
first to analyze the original scale in order to 
derive the abbreviated one. Otis (8, p. 98) has 
stated this criticism in his recent book on job 
evaluation as follows: “How would one go 
about building an abbreviated scale? There 
is no way of knowing which one of the shorter 
scales obtained by Lawshe would best apply 
to a given plant. Constructing a shorter scale 
would necessitate a complete job evaluation 
using a longer scale. Either factor analysis or 
the Wherry-Doolittle technique would have to 
be applied to the data, and a shorter scale so 
derived would be used to keep the system up 
to date. These savings would not be very 
great.” 

Purpose of this Study. The present study 
was conducted to determine whether an ab- 


has 


breviated job evaluation scale, constructed in 
the light of key items previously identified by 
Lawshe (3, 4, 5), will achieve the same basic 
result as longer job evaluation systems. This 
abbreviated job evaluation system was de- 
rived without making a Wherry-Doolittle or 
factor analysis of the specific job evaluation 
data involved. Thus, the present study repre- 
sents a cross-validation of this abbreviated job 
evaluation scale against the hold-out group of 
other job evaluation installations. 


Procedure 


Derivation of the Abbreviated Scale. The ab- 
breviated point scale used in the present study 
was derived from the results of earlier research 
by Lawshe (4). That study analyzed the 
NEMA job evaluation system as it operated in 
three industrial plants. The NEMA _ job 
evaluation system, which consists of eleven 
items, was adapted by Kress (2) from the West- 
ern Electric Company’s procedure for use by 
the National Electric Manufacturer’s Associa- 
tion. It is one of the most widely used point 
job evaluation systems. 

From the three multiple regression equations 
obtained in Lawshe’s study, an average mul- 
tiple regression equation was determined for 
cross-validation. Only those items which ap- 
peared in at least two of the three regression 
equations were used. These items were ex- 
perience, unavoidable hazards, and initiative 
and ingenuity. Experience was the only item 
which was present in all three equations. The 
average value for each of these three items 
plus the average value for the constant was 
selected as the basis for further study. The 
final equation for predicting total points on the 
original system on the basis of the three items 
mentioned was: 

Total points = 41+1.5 (experience points) 
+1.7 (initiative and ingenuity points) +3.8 
(hazards points). 

Eight different companies submitted point 


225 





SEWN CMRI Chinon = 


Bae Ox iis 


- 





226 Milton K. Davis and Joseph Tiffin 


job evaluation data to Purdue University. 
Each of the sets of job evaluation data had 
been obtained with a system that included the 
three items mentioned above among a con- 
siderably larger number of items. All of the 
jobs represented were for hourly-paid indus- 
trial workers. Three of these installations 
were NEMA plans and the remaining five were 
other types of point rating systems. The 
total points for the abbreviated scale were 
calculated for each job from the equation 
given above. 


Analysis and Results 


Three types of analysis of the data were 
made, The first was a correlational analysis. 
Table 1 presents the correlations for each 
company between the total points from the 
regression equation on the three items and the 
original total points. Examination .of this 
table shows that for those companies using 
the NEMA system the correlations were all 
.94 or above. Since these results were ob- 
tained from a multiple regression equation, the 
correlations presented correspond to multiple 
correlations between the abbreviated scale and 
the total point scale. It should be kept in 
mind, however, that since the multiple regres- 
sion equation used came from previous studies 
of other plant installations, the correlations ob- 
tained by no means represent the maximum val- 
ues that would have been obtained if the present 
data had been used in determining the multiple 
regression equation. The standard errors of 
the correlations are so small that any possi- 
bility of these representing chance fluctuations 


Table 1 


Correlations Between Three-Item Total and 
Total Points 


No. of 


Company Jobs 
No. 1 NEMA 3 
No. 2 NEMA 126 
No. 3 NEMA 

Non-NEMA Systems 
No. 4 (12 items 
No. 5 (11 items) 
No. 6 (10 items) 
No. 7 (11 items) 
No. 8 (23 items) 


from zero is virtually eliminated. Therefore, 
these correlations represent the extent of agree- 
ment between a single abbreviated point scale 
and the NEMA scale as it operated in three 
separate companies. 

The second part of Table 1 shows the correla- 
tions between the total points from the regres- 
sion equation on the three items and the 
original total points for the non-NEMA sys- 
tems studied. In view of the method used to 
derive this abbreviated scale, its application to 
the latter companies might be open to question. 
However, these correlations were determined 
to find how well this one shortened scale oper- 
ates for point scales which do not strictly 
follow the NEMA classifications and weights. 

All the non-NEMA company plans yielded 
correlations above .90 with one exception. 
Company No. 5, a power and light utility firm, 
used a system which placed more weight on the 
unavoidable hazards item than is true with the 
NEMA scale. This fact probably accounts for 
the lower correlation. 

Thus, the correlational analysis would seem 
to indicate that this abbreviated point scale 
has validity not only for the NEMA system 
but also for other point systems which are 
similar to the NEMA system and which include 
the items in the present abbreviated point scale. 

Another method of analysis was in terms of 
labor grade displacement. This analysis was 
made in two separate ways, the first of which 
determined labor grade displacement when the 
new total point values were determined from 
the regression equation previously discussed, 
and the size, number, and limits of the labor 
grades for the simplified scale were exactly 
the same as those on the original scale. There- 
fore, the labor grades of the company scale rep- 
resent the criterion against which the abbrevi- 
ated scale labor grades were compared. This 
analysis could be made only for the first six 
companies mentioned in Table 1 because labor 
grades were not available for companies No. 
5 and 6. 

These results are summarized in Table 2. 
If the same or next adjacent labor grade is 
set as an acceptable standard, it can be seen 
that for the three NEMA installations 97 per 
cent, 96 per cent, and 94.5 per cent of the jobs 
meet this criterion. In company No. 2 there 
are 1.6 per cent of the jobs, representing only 





Cross Validation of Abbreviated Point Job Evaluation System 


two jobs, which would be displaced by three 
labor grades. 

These results compare favorably with those 
previously reported by Lawshe (4). In plant 
A of his study, 99.2 per cent of the jobs would 
meet the criterion of the same or next adjacent 
labor grade. 

For the three companies with the non-NEMA 
point rating scales, the labor grade displacement 
was much greater. Table 2 shows that the per- 
centages of jobs in the same or next adjacent 
labor grades for the three non-NEMA systems 
studied were, respectively, 42.6, 18.2, and zero. 

Such low correspondence might seem to in- 
validate the use of the abbreviated scale with 
non-NEMA systems. However, a final analy- 
sis was made which overcame much of this 
difficulty. This analysis involved a determina- 
tion of the range of points from the lowest to 
the highest evaluated job on the abbreviated 
scale. This range was then divided into the 
same number and with the same relative width 
of labor grades that existed for the same jobs 
on the original scale. Also, the lowest evalu- 
ated job on the abbreviated scale represented 
the lower limit of the first labor grade for the 
abbreviated scale. This method gave a con- 
sistent 2nd reproducible approach for setting 
these labor grades. 

Table 3 shows the results of this analysis. 
The percentage of jobs falling in the same labor 
grade, displaced one, two, and three or more 
labor grades is shown. 


Table 2 


Labor Grade Displacement: Based on the Abbreviated 
Point Scale with Number and Limits of 
Labor Grades the Same as in 
the Original System 


Per Cent of Jobs 


Displaced 
Three or 
More 
Labor 
Grades 


Displaced Displaced 
One Two 
Labor Labor 
Grade Grades 


Same 
Labor 
Company Grade 
.1 NEMA 45.0 
vo.2 NEMA §2.5 43.5 2.4 1.6 
.3 NEMA 44.6 49.9 5.5 
9.8 32.8 37.8 
6.5 11.7 35.0 


52.0 3.0 


POR EP BE RE RO gure Pe erent 


Table 3 
Labor Grade Displacement: Based on Proper Number 


of Labor Grades within the Abbre- 
viated Point Range 


Per Cent of Jobs 


Displaced -Displaced Displaced 
Same ne Two Three 
Labor Labor Labor Labor 
Grade Grade Grades Grades 


29.9 15 
39.7 1.6 


Company 
No. 1 NEMA 68.6 
No.2 NEMA 56.3 
No.3 NEMA 448 
No. 4 42.8 
No. 5 31.2 
No. 6 36.8 


Table 3 demonstrates that for the three 
NEMA system companies 98.5 per cent, 96 
per cent, and 94.2 per cent of the jobs fall into 
the same or next adjacent labor grade when 
the abbreviated point range method is used. 
However, these results do not differ materially 
from those obtained by the method which 
used for the abbreviated system labor grade 
classification the same size, number, and limits 
of labor grades as those used in the original 
system. 

For the three non-NEMA systems there is 
definite improvement in accuracy of job place- 
ment by labor grade when the second method 
for placement is used. The respective per- 
centages were 90.4, 80.5, and 86 for displace- 
ment of less than two labor grades. In addi- 
tion, only one job in company No. 6 would be 
displaced by three labor grades. 

The results thus show that the agreement 
between the labor grade placement of jobs by 
the long and the short scale is essentially the 
same when the original size and limits of labor 
grades are used for the short scale as when a 
proper division of the abbreviated point scale 
in the labor grades is used with this scale. For 
the non-NEMA systems, the proper allocation 
of labor grades within the abbreviated point 
range appears to result in greater agreement 
between labor grade placement by the long 
and short scales. 

Summary 


An abbreviated point job evaluation scale 
was constructed on the basis of prior research 








228 


on the NEMA job evaluation system. This 
abbreviated scale was based upon the average 
of three multiple regression equations found 
in that earlier study. 

Eight job evaluation installations were 
studied for cross-validation of this one abbre- 
viated scale. These installations included 
three NEMA systems and five non-NEMA 
point systems. All the jobs represented by 
these plans were hourly-paid. 

Zero-order correlations between the total 
points of the abbreviated scale and the longer 
original scale were calculated for each of the 
eight installations. The amount of labor 
grade displacement was found by two methods. 
The first method compared the labor grade 
displacement when the labor grades for the ab- 
breviated system were set up using the same 
limits as the original installation. In the 
second method for comparison of labor grade 
displacement, the abbreviated scale labor 
grades were set up by dividing the abbreviated 
scale range into the same number of labor 
grades as the original scale used for those jobs. 

On the basis of this study, the following 
conclusions are supported: 


1. For the NEMA installations, the correla- 
tions between abbreviated scale total points 
and the original scale total points were .96, 
.95, and .94. 

2. This abbreviated scale will operate nearly 
as effectively for non-NEMA point job evalua- 
tion systems. With one exception, the ob- 
tained correlations of total points between the 
simplified scale and the non-NEMA point sys- 
tems were above .90. A main requirement in 
applying this abbreviated scale to non-NEMA 
systems is that these other systems have items 
which closely approximate the three chosen 
items. 

3. Labor grade displacement results show 
that 97 per cent, 96 per cent, and 94.5 per cent 
of the jobs in the NEMA installations would 
remain in the same or next adjacent labor 
grades when predicted from the abbreviated 
scale. These results do not change materially 


Milton K. Davis and Joseph Tiffin 


when labor grades are set up using the abbre- 
viated point range. 

4. For the non-NEMA point systems stud- 
ied, the superior approach was to divide the 
abbreviated scale point range into labor grades, 
the number and relative width of which are 
determined by the number and relative width 
of labor grades in the original scale. This 
method yielded 90.4 per cent, 80.5 per cent, and 
86 per cent in the same or next adjacent labor 
grade. 

5. From the results of the present study, it 
would seem that an abbreviated scale made 
up of three items—experience, hazards, and 
initiative and ingenuity—usually will achieve 
essentially the same results as a more extensive 
point system which includes these items. 


Received October 1, 1949. 


References 


Chesler, D. J. Abbreviated job evaluation systems 
derived on the basis of “internal” and “external” 
criteria. J. appl. Psychol., 1949, 33, 151-157. 

Kress, A. L. How to rate jobs and men. Fact. 
Memt., 1939, 10, 60-65 

Lawshe, C. H., Jr., and Satter,G. A. Studies in job 
evaluation. 1. Factor analysis of point ratings 
for hourly-paid jobs in three industrial plants 
J. appl. Psychol., 1944, 28, 189-198. 

. Lawshe, C. H., Jr. Studies in job evaluation. 2 
The adequacy of abbreviated point ratings for 
hourly-paid jobs in three industrial plants. J. 
appl. Psychol 9 1945, 29, 177 184. 

Lawshe, C. H., Jr., and Maleski, A. A. Studies in 
job evaluation. 3. An analysis of point ratings 
for salary paid jobs in an industrial plant. 
J. appl. Psychol., 1946, 30, 117-128. 

Lawshe, C. H., Jr., and Alessi, S. L. Studies in job 
evaluation. 4. Analysis of another point rating 
scale for hourly-paid jobs and the adequacy of 
an abbreviated point scale. J. appl. Psychol., 
1946, 30, 310-319 

Lawshe, C. H., Jr., and Wilson, R. F. Studies in 
job evaluation. 5. An analysis of the factor 
comparison system as it functions in a paper 
mill. J. appl. Psychol., 1946, 35, 426-434 

. Otis, J. L., and Leukart, R. H. Job evaluation. 
New York: Prentice-Hall, 1948 

. Peters, C. C., and Van Voorhis, W. R. Statistical 
procedures and their mathematical bases. New 
York: McGraw-Hill Book Co., 1940. 





Age and Route Sales Efficiency * 
C. B. Cover and S. L. Pressey 


Ohio State University 


In the past fifty years, the average length 
of life in this country has increased 18 years. 
from about 49 to 67; and the proportion of 
people 45 and over has grown from 18% to 29% 
(4, pp. 50 and 256). Also, “there has also been 
a tendency over a long period of time to lower 
the age at which workers retire from active 
employment.’ And pressure for pension plans 
has accentuated a ‘wide-scale prejudice against 
hiring of the worker over 45 (2, pp. 63 and 20). 
The plight of the older workers thus mounts 
at the same time that their number increases. 

Fortunately, the situation is receiving in- 
creasing attention, as evidenced by mentions in 
newspapers and magazines, and the efforts of 
such groups as the Desmond committee of 
the New York legislature (2, 3). Considera- 
tion has naturally tended to center on large- 
scale industry. But other common types of 
work may present certain of these problems in 
acute form. An occupation which initially 
gave such good financial returns that it tended 
to hold men into middle life past the age of 
ready occupational shift, gave little by way of 
experience which might help in any shift, but 
then dropped them or became so burdensome 
that they dropped out, would seem to pre- 
sent such problems. If thereby the business 
lost older workers whose experience might 
in some way have been utilized, it also suffered. 
This paper attempts briefly to outline such a 
situation. 

Cases and Materials 


The present study deals with 92 men whose 
work consisted of house-to-house selling of 
foodstuffs from a truck, making collections, and 
canvassing for new customers. Each salesman 
operated two routes on alternate days and thus 
served each customer three times a week. 
Pay, on a commission basis, was excellent,—a 
hundred dollars or more a week. A successful 

*The data were gathered by Mr. Cover while a 
student at Ohio State University; he is now on the staff 


of Ohio University but in September, 1950, will become 
Assistant Dean of Students at Muskingum College. 


229 


salesman needed to have a pleasing personality, 
skill in selling, and enterprise in soliciting new 
customers. Presumably, all this involved 
knowledge of his goods, of demand in different 
neighborhoods, and of the likes of each cus- 
tomer. Since many of the sales were on credit, 
the man needed to be shrewd in his appraisal 
of customers; he needed also to estimate closely 
goods needed on each run so that there would 
be enough, but little surplus which might spoil 
or be salable only at a discount. He had also 
to be careful and efficient in his handling of his 
truck. Finally, the route salesman needed the 
physical stamina required for repeatedly get- 
ting in and out of his truck and going up and 
down steps carrying a basket loaded with 
merchandise, eight or nine hours a day, six 
days a week. 

The findings are based on rank order ratings 
by sales supervisors of these men in: (1) total 
sales; (2) credit; (3) surplus; (4) truck repair 
and accidents; and (5) time. Ratings were 
based largely upon records and were thus fairly 
objective. 

Results 


The important features of the findings are 
exhibited in the following tables. In Table 1, 
the number at each age, as shown by the row 
of figures next to the bottom, should first be 
noted. Most men were in their twenties and 


thirties, with few staying beyond these years 
and only one (included with those in their 


fifties) past 60. Moreover, the first part of 
Table 1 shows that the older men who did 
continue fell off markedly in the basic issue of 
amount of sales. The twenty best salesmen 
are all under 40. Median rank drops markedly 
from the thirties to the fifties. A similar fall- 
ing off was found in efficiency in use of time. 

In contrast, the second part of Table 1 shows 
increasing efficiency with age in truck opera- 
tion, as shown by repair costs and accidents, 
up through the forties; the twenty-two drivers 
who were poorest in these respects were all 
under 40. There was a similar improvement 








230 


with age in handling of credit, while judgment 
regarding surplus remained about the same. 
In short, though judgment appears to improve 
with age, a falling off in energy appears to 
cause a decrease in effectiveness in route 
salesmen. 

The question naturally arises as to whether, 
if years of service instead of simply age were 
considered, ratings would be different. Table 
2 so groups the cases, and in terms of ratings 
for total efficiency. Here the poorer ratings of 
the older salesmen are even more clearly ex- 
hibited! The two who have been salesmen 
longest are among the poorest. 

The total task involved in these route sales 
jobs thus seems to be such that men do not 
maintain their efficiency past early middle life 
and most drop out. What happens to them? 
The truck sales positions pay very well and 
they have high status. Few of the men are 
willing to drop back to a much more poorly 
paying sales position in one of the company’s 
stores. And as the business is organized, there 
are few supervisory or production jobs into 
which these men can go. As a result, though 
intimately familiar with the product sold and 
with sales problems, and potentially of con- 
tinuing use to the firm, they commonly leave 
the business in their 40’s. Some of them 


Table 1 


Rank Order in Total Sales and Truck Operation and 
Accidents of 92 Route Salesmen, as 
Related to Age * 





Age and Truck 
Operation 


Age Groups: Total 

Sales 
Rank 
Order 


1-10 3 
11-20 
21-30 
31-40 
41-50 
51-60 
61-70 
71-80 
81-90 
91-100 


Total 29 42 «13 * wer 
Median 52 36 46 ot 41 18 36 


20-9 30-9 40-9 50-9 20-9 _ 30-9 40-9 50-9 


5 


6 


— 
e 


~s 


uoekre NNR 


an ed 
—-uUwMw wns 


® 
= 
| 
N 


* Median rank for cach age group is shown by N in 
bold-face type 


C. B. Cover and S. L. Pressey 


Table 2 


Tota! Efficiency of 92 Salesmen in Relation to 
Years of Service * 


Years of Service 


Rank 
Order 
1-10 
11-20 
21-30 
31-40 
41-50 
51-60 
61-70 
71-80 
81-90 
91-100 


Total 75 8 i ‘ 7 
Median 43 36 68 


10-19 20-29 
. oak 
1 


1 


NUCwMHMeSoSosen! 


* Median rank order for each length of service group 
is shown by N in bold-face type. 


thereafter may do well. But most of them, 
so far as known, find no other satisfactory 
position at that age. And they become dis- 
satisfied subsistence farmers, insurance sales- 
men, handymen, or otherwise on the margin 
of things vocationally. 


Discussion 


The data are indeed inadequate. Having 
been obtained after the war, they are un- 
doubtedly influenced in various ways by cir- 
cumstances related thereto. About the later 
careers of all too large a portion of the salesmen 
whe had left, nothing was known. But known 
outcomes were so predominantly unsatisfactory 
that they seemed to point up certain problems 
of management and employee welfare. Might 
truck sales work be in some way made less 
fatiguing for older men, so that they might 
continue longer? Might the concern find more 
positions in its own organization into which 
these men might shift? Could the men’s atti- 
tudes regarding the status of indoor selling jobs 
be so modified that they would be more willing 
to take them? Or could employment offices 
find more positions into which men with such 
experience could go, when they are in their 
forties or fifties? Situations of the type de- 
scribed in this brief paper seem not economi- 
cally healthy, and in need of continuing study. 





Age and Route Sales Efficiency 


Summary 


With the increasing length of life and number 
of older workers in this country, it becomes in- 
creasingly important to investigate the rela- 
tions of efficiency in different types of work to 
age, and to consider means by which indi- 
viduals who are in occupations not feasible for 
the older years may find other vocational 
opportunities. 


1. A study of 92 men selling foodstuffs from 
retail trucks showed increased efficiency with 
age in handling their trucks and in judgment in 
business relations with their customers (as in 
handling of credit), but a falling off in sales. 
The drain on physical energies involved in the 
work appeared to be the chief factor. 

2. These men tended to leave such positions 
after 40; but most did not find other oppor- 
tunities in the firm nor (so far as known) 
locate satisfying employment elsewhere. 


231 


3. The question is raised as to whether or 
not such work might be adapted so that older 
men could continue longer or other positions 
in the same firm found for them, which would 
utilize their experience. If not, might em- 
ployment offices find opportunities for such 
men which, to some degree, would take account 
of their vocational background, and be more 
suited to their age and needs? 


Received May 22, 1950. 
Early publication. 


References 


1. Clague, E. After 45,—How about a job? 
Graphic, 1950, 86, 173-176. 

2. Desmond, T. C., and others. Birthdays don’t count. 
New York State Joint Legislative Committee on 
Problems of the Aging, Newburgh, N. Y., 1948. 

3. Desmond, T. C., and others. Never too old. New 
York State Joint Legislative Committee on 
Problems of the Aging, Newburgh, N. Y., 1949 

4. Dublin, L. L., and others. Length of life (Rev. Ed.) 
New York: Ronald Press, 1949. 


Survey 





Ortho-Rater Norms and Sex Differences 
J. H. Ely, N. C. Kephart, and Joseph Tiffin 


Occupational Research Center, Purdue U niversity 


norms on the vision tests included in this in- 
strument. It is the purpose of the present 
article to present such norms, based on quite 


The growing use of the Ortho-Rater as an 
industrial vision testing instrument has sug- 
gested the need for male and female industrial 

Table 1 
Percentile Norms on Ortho-Rater Distance Acuity Tests 


Male Female 


Zan oufF WSN oS 


ee ae 
Neon oo 


Z 


Both 


Eyes 


a ns iv ; 
ne ee ee en 


Right 
Eye 


Left 
Eye 


1 

2 
3 
4 
5 


66 
9] 
97 
99 
100 


7,048 


Worse 
Eye 


100 
100 


7,640 


Table 2 


Both 
Fyes 


Right 
Eye 


ee ed 


oa 


12 
20 
37 
53 
67 
&2 
97 
a) 
100 
100 


2,469 


Left 


Eye 


97 
99 
100 
100 


2,467 


Percentile Norms on Ortho-Rater Near Acuity Tests 


Male Female 





Score 


Both 


Eyes 


Right 
Eye 


Left 
Eye 


3 
3 
4 


Worse 
Eve 


4 
6 


9 


Both 
Eyes 


— ne ee 


-rw 
ee 


93 
99 
100 
100 
100 


2,468 


Right 
Eye 


Sm wre ee 


92 
‘99 
100 
100 
100 


2,469 


Left 
Eye 


6 
11 
17 
28 
53 
71 
&9 
OR 
100 
100 
100 


2,468 


100 
100 


2,467 





Ortho-Rater Norms and Sex Differences 


Table 3 


Percentile Norms on Ortho-Rater Vertical Phoria Tests 


—= 


Male Female 
Score Far Near Far Near 
Left 

Hyperphoria 


1 
2 
3 20 
4 
5 


86 
6 E 95 
8 . 98 
Right 
Hyperphoria 9 100 100 100 


N 7,444 2,436 2,439 


large samples of industrial employees. A 
second purpose is to compare the mean scores 
and variability of men and women selected 
randomly from plants currently using the 
Bausch and Lomb Industrial Vision Service. 


Norms 


Tables 1 to 5, inclusive, give percentile norms 
on the several Ortho-Rater tests. These 


Table 4 


Percentile Norms on Ortho-Rater Lateral Phoria Tests 


Male Female 
Far Near Far Near 


Esophoria ; 3 3 
6 7 6 

12 

22 

36 

53 


Exophoria 


Table 5 


Percentile Norms on Ortho-Rater Stereopsis 
and Color Tests 


Stereopsis Color 
Test Test 
Score Male Female Score 


Male Female 


0 18 
26 

35 

47 

58 

67 

79 

85 

89 

100 


cue 


wah 
nw uU 


2,464 


norms are based on approximately 7,600 men 
and 2,500 women. At the bottom of each 
column in the tables is given the exact number 
of cases on which the individual test norm is 
based. The values in the tables are percentiles. 

Table 6 summarizes the means, S.D.’s, and 
differences between males and females for the 
several tests in the Ortho-Rater. Quite a num- 
ber of the differences, both in means and S.D.’s, 
are significant at the 5% level or below. 
Since the number, of cases was large, several of 
the differences which are actually very small, 
were found to be significant at the 5% or even 
1% level. Such differences are probably of 
more theoretical than practical importance. 

It will be noted that in the color vision test 
a difference in favor of the men was found. 
This difference was significant at the 1% level. 
The authors are aware that this finding is 
contrary to long accepted theories and facts 
about the distribution of color blindness among 
the sexes. The explanation for this difference 
in findings is not known. 

It should be kept in mind that while the data 
of this study were based on randomly selected 
men and women employed on industrial jobs, 
they may not be typical of randomly selected 
men and women from the general population. 
Traditional industrial practices of placing men 
and women on different types of jobs may have 


SN ee 








J.H. Ely, N.C. Kephart, and Joseph Tiffin 


Table 6 
Comparison of Male and Female Means and Standard Deviation on Ortho-Rater Vision Tests 


Note: Numbers of cases in various categories shown at bottom of Tables 1 to 5. 


Differ- 
ence 


1.05 
9 
9 
#9 


Male Female 





9.64 
912 
8.97 
8.27 
947 
9.03 
9.12 


Far Acuity, Both 
Far Acuity, Right 
Far Acuity, Left 
Far Acuity, Worse 
Near Acuity, Both 
Near Acuity, Right 
Near Acuity, Left 
Near Acuity, Worse 
Far lateral phoria 
Far vertical phoria 
Near lateral phoria 
Near vertical phoria 
Color 

Stereopsis 


5.08 
5.23 

* Significant at 1% level. 
** Significant at 5% level. 


resulted in certain secondary selection factors 
which would operate to make any randomly 
selected group of industrial employees (particu- 


18.53" 


Differ- 
S D..male S.D female ence 
2.07 2.07 00 
2.63 2.52 Al 
2.70 2.57 13 
2.94 2.09 25 
2.12 1.60 52 
2.77 2.12 65 
2.94 2.35 59 
3.09 2.46 63 
2.56 2.71 15 
1.28 10 
3.23 09 
03 


CR. 
21.88° 
16.80° 
1.50 
15.47* 
1.00 
2.64* 
1.72°* 
00 
2.90* 
63 
5.00* 
87 
14.81" 


06 


larly women employees) different in certain 
respects from the general population. 


Received October 1, 1949. 





Fluorescent Light Versus Daylight 


J. Stanley Gray and Paul Prevetta 
Universily of Georgia 


Fluorescent lighting is comparatively new 
and few studies have been made of its effects 
on human vision. Luckiesh and Moss (1) 
compared fluorescent lights with tungsten 
lights of 20 foot-candles intensity by having 10 
subjects read for 30-minute periods on 9 dif- 
ferent days. The number of involuntary eye 
blinks during the first and last five minutes of 
each reading period was counted on the as- 
sumption that the frequency of blinks is related 
to the difficulty of seeing and the state of 
visual fatigue. They found that the increase 
in blinks was approximately the same for fluo- 
rescent lights as for tungsten lights. The small 
differences in increased blinking were not sta- 
tistically significant. 

However, Tinker (4) has criticized the rate 
of blinking as being a non-valid index of the 
visual function. He had 60 university stu- 
dents read under 10 foot-candles of well- 
diffused light and found that “the frequency of 
blinking is an unsatisfactory criterion of the 
readability of print.” McNally (3) studied 
the relation of blink rate to reading various 
sizes of type and found no correlation. The 
rate of blinking was unrelated to the size of 
type. McFarland, Holway, and Hurvich (2) 
found an inconsistent relationship between the 
rate of blinking and both the duration of read- 
ing and the intensity of illumination. They 
concluded that blinking is not a reliable index 
of visual fatigue. 

The authors of the present study used the 
American Optical Company’s Sight-screener to 
measure the effects of two hours of continuous 
reading under daylight as compared with two 
hours of continuous reading under fluorescent 
lights. The visual functions measured were 
acuity, stereopsis, and both lateral and vertical 
phoria at 14 inches and at 20 feet distances. 
Fifty subjects (high school, college, and older 
adults) read books set in 8 point type for two 
hours under daylight of 20 foot-candles inten- 
sity and for two hours under fluorescent lights 
of 20 foot-candles intensity. For both reading 


235 


sessions, the light source was behind and above; 
books were kept on an easel at right angles to 
the line of vision; intensity was maintained at 
20 foot-candles (venetian blinds were used to 
regulate the daylight); glare was eliminated by 
painting all peripheral background pastel 
green; the same book was read under both con- 
ditions; reading booths reduced outside dis- 
tractions; and the visual skills were measured 
(using the Sight-screener) at the beginning, at 
the end of one hour of reading, and at the end 
of two hours of reading for both sessions. 
Some of the subjects read first under fluorescent 
lighting and then under daylight, and others 
read first under daylight and then under 
fluorescent lights. 

There was both loss and gain in all four 
visual skills at both near and far measure- 
ments. Some subjects actually increased in 
certain visual skills after the two-hour reading 
period although most subjects lost in all skills 
except stereopsis. The summarized results are 
shown in Table 1. The differences between 


Table 1 


Visual Effects of Two Hours of Reading under Daylight 
and Fluorescent Light for Fifty Subjects 


Daylight 
(Mean Change) 


Fluorescent 
(Mean Change) 


Near Distance 
(14 inches) 
Acuity* 
Stereopsis* 
Lateral Phoriat 
Vertical Phoriat 


Far Distance 
(20 inches) 
Acuity* 
Stereopsis* 
Lateral Phoriat 
Vertical Phoriat 


- 04 
-- 20 

26 
-- 02 


. Data in “units of change.” 
t Data!in ‘diopters. 








236 J. Stanley Gray and Paul Prevetia 


daylight and fluorescent light were very small 
and none was statistically significant. 


Summary 
The conclusion seems to be justified that, as 
measured by the sight screener, fluorescent 
light of 20 foot-candles intensity is not inferior 
to daylight of the same intensity for reading 
8 point type material for two hours duration. 


Received May 22, 1950 
Early publication 


References 


Luckiesh, M., and Moss, F. K. Vision and seeing 
under light from fluorescent lamps. /ilum 
Engng., N. Y., 1942, 37, 81-88. 

McFarland, R. A., et al. Studies in visual fatigue. 
Cambridge, Mass.: Harvard School of Business 
Administration, 1942. 

McNally, H. J. The readability of certain type 
sizes. Teach. Coll. Contr. Educ., 1943, No. 883 

Tinker, M. A. Validity of frequency of blinking as 
a criterion of readability. J. exper. Psychol., 
1946, 36, 453-460. 





Inconsistency in the Predictive Value of a Battery of Tests 


Robert M. W. Travers 
Division of Teacher Education, Board of Higher Education, New York City 


and 
Wimburn L. Wallace 


University of Massachusetts 


The increased number of applicants for ad- 
mission to professional schools in recent years 
has made administrators keenly aware of prob- 
lems of selection. A growing dissatisfaction 
has been felt with the traditional criteria for 
selection, such as undergraduate grades or 
letters of recommendation, and some attention 
has been devoted to the possibility of using 
tests for selective purposes. Various large- 
scale testing programs have been initiated for 
this purpose in many professional areas, while 
in others, smaller programs have been devel- 
oped for studying the problem of selection. 
The schools of dentistry have shown admirable 
and exemplary caution in this matter, and 
while the Association of Schools of Dentistry 
has sponsored an experimental testing program, 
it has not introduced the tests as a requirement 
for admission, prior to adequate investigation. 
The use of tests for the selection of students 
of dentistry is still largely experimental. 

During 1947 and 1948, the University of 
Michigan has administered tests to applicants 
for admission to its School of Dentistry. The 
primary purpose of this battery was to explore 
possible improvements in the admission sys- 
tem. The tests were selected on the basis of 
previous studies which now have been ade- 
quately reviewed in a bulletin published by the 
Veterans Administration! and which need not 
be reviewed again here. The one exception in 
this procedure was the inclusion of a test of 
“Effectiveness of Expression’’ which, it was 
believed, might measure a competency im- 
portant to the dentist though possibly of little 
importance in achieving satisfactory grades in 
dental school. 

The following tests were included in the bat- 

' Predicting success in training for dentistry. Veterans 


Administration Technical Bulletin TB 7-44, July 8, 
1947, pp 16. 


tery: ACE Psychological Examination, 1945 
Edition; MacQuarrie Test for Mechanical 
Ability; Bennett Test of Mechanical Compre- 
hension, Form BB; Revised Minnesota Paper 
Form Board Test, Series MB; Interpretation 
of Reading Materials in the Natural Sciences, 
College Level, Tests of General Educational! 
Development; and Effectiveness of Expression, 
Cooperative English Test B2, Form T. 

The tests were administered at the Uni 
versity of Michigan and at centers throughout 
Michigan. A few centers were also established 
outside of the State. In the Spring, 1947 
testing program there were 32 centers, but 
when the program was given again in the 
spring of 1948 these centers were reduced to 18. 
At each center the appointed examiner was 
given a set of directions to be read ad verbatim 
to the examinees. 

The scores on the tests were correlated with 
the subsequent achievement of those admitted. 
These correlation coefficients are given in 
Table 1. 

This table shows an apparent inconsistency: 
the tests had some predictive value when they 
were given to the group admitted in 1948, but 
practically no predictive value when they were 
given to the previous class. A number of 
hypotheses suggest themselves concerning this 
inconsistency, the most obvious one being that 
the tests were improperly administered during 
1947 but were properly administered in 1948. 
This hypothesis is easily tested since a group of 
tests, similar in many respects to the tests 
discussed, was administered to all students 
after admission through a testing program 
sponsored by the American Dental Associa- 
tion. If the tests given in 1947 were im- 
properly administered one could expect that 
they would have low validity, but that the 
battery administered by the American Dental] 








Robert M. W. Travers and Wimburn L. Wallace 


Table 1 


Coefficients of Correlation Between Honor Point Ratio and Test Scores 





Tests 
ACE Quantitative Score 
ACE Linguistic Score 
ACE Total Score 
MacQuarrie . 
Mechanical Comprehension 
Paper Form Board. . 
GED Natural Sciences 
Effectiveness of Expression 


Association would not be affected by this 
factor and should have validity comparable 
with that of the tests given in 1948. 

The correlations of the tests in the American 
Dental Association battery with honor point 
ratio are summarized in Table 2. 

The tests of the American Dental Associa- 
tion show the same phenomenon exhibited by 
the University of Michigan tests. They have 
little predictive value for those admitted in 
1947 but substantial value in 1948. Hence, it 
is unlikely that improper administration of the 
1947 battery can account for the p' enomenon. 

The next hypothesis concerni: ¢ the change 
in the predictive value of the tests from 1947 
to 1948 is that the 1947 group exhibited a more 
limited range of ability. Table 3 shows the 
means and standard deviations of the scores 
of the two groups. 


1948 Admissions 
(N = 79) 
Ist Year 
Honor Point 
Ratio 


1947 Admissions 
(N = 82) 


Ist Year 2nd Year 
Honor Point Honor Point 
Ratio Ratio 


— 08 Av 
—.09 39 
— 10 48 
—.10 27 

07 .28 

05 .25 
20 19 AS 
O1 02 36 


The data presented do not support the hy- 
pothesis that the lack of predictive value of 
the tests given in 1947 is due to a restricted 
range of test scores or of achievement (HPR). 

A third hypothesis is that the system of 
grading changed from 1947 to 1948. This 
hypothesis does not seem to be consistent with 
the evidence. A discussion of the problem 
with administrative officials indicated that no 
changes had been made in the grading system. 

A fourth hypothesis is that the group selected 
in 1947 was in some way different from that 
selected in 1948. Differences between the 
t-can scores of the two groups are small and 
substantially the same part of the measuring 
scale was used for prediction in the two cases. 
It is, however, possible that in the case of the 
group admitted in 1947, there might have been 
a tendency on the part of the admissions officer 


Table 2 


Coefficients of Corre'at'on Between American Dental Association Tests and Honor Point Ratio 


Tests 


ACE Quantitative Score 
ACE Linguistic Score 
ACE Total Score 

Le GED Reading 

GED Spelling 

English,— Total 

Object Visualization 
Natural Science Survey 


1947 Admissions 1948 Admissions 


ist Year 
Honor Point 
Ratio 


2nd Year 
Honor Point 
Ratio 
09 OS 35 
09 OF 27 
O5 00 Al 
18 18 34 
— 03 —.03 42 
05 04 AT 
25 .23 25 
26 29 44 


Ist Year 
Honor Point 
Ratio 





Inconsistency in Predictive Value of Tests 


Table 3 


Standard Deviation and Mean of Test Scores of Those Admitted in 1947 and 1948 





1947 Admissions 


1948 Admissions 


ACE Quantitative Score 
ACE Linguistic Score 

ACE Total Score 
MacQuarrie 

Mechanical Comprehension 
Paper Form Board 

GED Natural Sciences 
Effectiveness of Expression 
Honor Point Ratio (1st Yr.) 


to accept ei/her high grades in pre-professional 
work or high test scores for admission. If this 
were the case to any extent it might result in 
the selection of a group in which there was a 
correlation of zero between previous scholastic 
record and test scores. This seems to be the 
case, for as Table 4 demonstrates, the group 
selected in 1947 shows a generally lower corre- 
lation between previous scholastic record and 
test scores than those selected in 1948. 


Table 4 


Coefficients of Correlation Between Test Scores and 
Average Grade in Pre-professional Work 


1948 
Admissions 





1947 


ACE Quantitative Score 
ACE Linguistic Score 
ACE Total Score 
MacQuarrie 

Mechanical Comprehension 
Paper Form Board 

GED Natural Sciences 
Effectiveness of Expression 





Standard 

Deviation 
7.94 
12.94 
18.23 
10.46 
10.87 
7.25 
11.00 
10.36 
0.45 


Standard 
Deviation 
84 

11.8 
17.05 
12.25 
8.73 
6.609 
10.82 


Mean 


50.5 
78.2 
128.7 
78.0 
37.0 
48.3 
50.3 


WZ 43.6 
1.53 


0.44 
If the type of selective process under dis- 
cussion were operative in 1947 one would 
expect not only that the group selected would 
show exceptionally low correlations between 
test scores and previous scholastic record but 
also that there might be exceptionally low cor- 
relations between test scores and subsequent 
performance. 


Summary 


This study has been presented to illustrate 
the fact that the process of validating tests for 
admission to an educational program by finding 
the correlation with subsequent grades needs 
to be more carefully scrutinized. It is 
possible that the correlations between test 
scores and grades may be zero although the 
instrument is valid for selection purposes. 
The process of selection may control the size 
of validity coefficients and should be properly 
controlled in any experimental validation of 
a test. 


Received October 7, 1949. 








Intercorrelations in Merit Rating Traits 


C. E. Jurgensen 


Minneapolis Gas Company 


Rating scales used by different companies 
vary considerably in the number of traits rated. 
In an analysis of 132 rating scales, Mahler (6) 
found that the number of traits varied from one 
to thirty-three with a mean of 9.3. Com- 
panies using such scales appear to make two 
assumptions: (1) the traits rated are important 
from management’s point of view, and (2) 
ratings on the various traits are discrete. This 
study deals only with the second of these 
assumptions. 

Kornhauser (5) reported intercorrelation of 
instructors’ ratings on 68 students who were 
rated on intelligence, industry, accuracy, co- 
operativeness, initiative, moral trustworthi- 
ness, and leadership. Intercorrelations ranged 
from .45 to .83 with a median of .69. 

Driver (2) reported a study of a ten trait 
merit rating scale from which several traits had 
been discarded because ratings were not dis- 
crete. The intercorrelations of the rernaining 
ten traits (based on N’s varying from 100 to 
300) ranged from .11 to .79 with a mean of .46. 

Ewart, Seashore and Tiffin (3) reported a 
study of ratings of 1120 men on a twelve trait 
scale covering safety, knowledge of job, versa- 
tility, accuracy, productivity, overall job per- 
formance, industriousness, initiative, judg- 
ment, cooperation, personality, and health. 
Intercorrelations on the twelve trait scale 
ranged from .25 to .88 with a median of .75. 
Reliability coefficients were not reported. A 
factor analysis showed that a general factor 
(ability to do the present job) accounted for 
most of the total variance. A second, and 
oblique, factor was tentatively named ‘skill 
possessed over and above the requirements for 
the specific job.” The third and final factor 
(health) was discarded as an artifact. 

Bolanovich (1) made a factor analysis of 
ratings on 143 field engineers whose work 
required a wide range of abilities and character- 
istics. They were rated on fourteen traits: 
personality, personal appearance, punctuality 


’ 


thoroughness, efficiency, resourcefulness, de- 
pendability, cooperation, job attitude, tech- 
nical ability, sales ability, organizing ability, 
judgment, and desire for self improvement. 
Intercorrelations ranged from .05 to .73 with 
a median of 49. A factor analysis resulted in 
six common factors: attendance to detail, 
ability to do the present job, sales ability, con- 
scientiousness, organizing or systematic tend- 
ency, and social intelligence. The 
tion between factors was not reported. 

Intercorrelations given in the above studies 
appear to be typical of those usually reported 
when rating scales consist of relatively narrow 
and specifically defined traits. Such narrow- 
ness is generally believed to be advisable. For 
example, Paterson (7) has said “each trait to be 
rated should be restricted to a single type of 
activity or to the results of a single type of 
activity, otherwise the ratings will be am- 
Although much can be said in favor 
of restricted traits, the fact remains that inter- 
correlations have generally been high. The 
hypothesis was therefore proposed that rela- 
tively independent factors would be obtained 
if each factor consisted of a cluster of sub-traits 
which might logically be expected to be highly 
correlated in the rating situation, and if the 
major factors could logically be expected to be 
relatively discrete in actuality. The testing of 
this hypothesis was a primary purpose of the 
research reported here. 


correla- 


biguous.”’ 


The hypothesis was also proposed that re- 
ported intercorrelations between traits are 
spuriously high due to differences between 
raters in leniency and variability. 
sible effect of such differences seems to have 
been given consideration with respect to inter- 
pretation of ratings on specific employees, but 
to have been ignored consistently with respect 
to intercorrelation of traits. The testing of 
this hypothesis was also a primary purpose of 
the research reported here. 


The pos- 


240 





i 


A 


Intercorrelations in 


Description of Scale 


The rating scale developed for this study 
consisted of six traits which were thought to be 
reasonably discrete plus ratings on overall 
value and potentiality. The nature of the 
scale is illustrated by the first trait: 

Work Habits. Consider initiative, industry, 
persistence, accuracy, carefulness of detail, 
orderliness, thoroughness, speed, punctuality 
and other related work habits. 


0—Unsatisfactory, does not meet job requirements 


Below average, partially meets job requirements 


/ 
Average, meets job requirem=nts 


6— Above average, exceeds job requirements 

‘ 

8—Outstanding, far exceeds job requirements 

Definitions of other traits used in the scale 
are given below. Descriptions of degrees were 
similar in nature to those given under Work 


Habits. 


Attitudes. Consider employee's enthusiasm, 
interest, loyalty, willingness to cooperate, team- 
work, open mindedness, tolerance, ability to 
accept criticism, and other such attitudes 
toward the job, fellow workers, supervisor, and 
company. 

Acceptance by Others. Consider whether this 
employee is liked and respected by co-workers 
and public, how they react to his appearance, 
voice and manner, tact, courtesy, agreeable- 
ness, friendliness, etc. 

Self Control. Consider his emotional matu- 
rity and balance, disposition, stability, calm- 
ness, maturity of action, etc. 

Mental Ability. Consider his alertness, rea- 
soning ability, common sense, memory, judg- 
ment, analytical ability, ability to express self 
orally or in writing, mechanical intelligence, 
ease of learning new work and remembering 
instructions, and other similar aspects of in- 
telligence. 

Physical Ability. Consider his agility, coor- 
dination, dexterity, vision, hearing, endurance, 
energy, strength, overall health, etc. 

Overall Rating. Decide whether he is cur- 
rently an asset or liability to your department, 
considering a!l evidence (whether mentioned 
in above characteristics or not) which affects 
his all-around performance at the present time. 

Potentiality. To what extent does this em- 
ployee possess capacity for future growth? 
Consider his all-around performance at the 
present time in light of his age, health, experi- 
ence, etc., and the extent to which these and 


i BRS UG or Blew ae 


Merit Rating Trails 241 


other factors will foster or 


development. 


prevent future 


Method 


Instructions to raters required them to 
decide to what degree the employee possessed 
the trait under consideration and to encircle 
the number which best described him with re- 
spect to the position he held. Ratings on all 
traits were on a nine point scale, each de- 
scription including one or two equivalent 
phrases. Descriptions of degrees were selected 
on the basis of a rough application of Thurs- 
tone’s method for scaling equal appearing 
intervals (4). 

Results reported here are based on ratings 
of 199 employees by 26 supervisors. The em- 
ployees were all hired within a six week period, 
were all on the same job, and were randomly 
divided into 26 crews having seven or eight 
employees each. The work was such that 
supervisors had considerable contact with 
their crew members on an individual basis, but 
relatively little contact with their crew as a 
unit. Ratings were obtained three months 
after employment and again five months after 
employment. Reliability coefficients for the 
eight traits were obtained from the repeated 
ratings. 

Each supervisor had rated seven or eight men 
on two eccasions on eight traits. Therefore 
approximately 120 ratings by each supervisor 
were available. Inspection of these ratings 
showed that supervisors differed considerably 
both in mean rating and variability of ratings. 
On the nine point rating scale, mean ratings by 
supervisors ranged from 3.7 to 7.0. Varia- 
bility, in terms of sigma, ranged from .8 to 2.1. 
All ratings were converted to standard scores 
having a mean of 50 and sigma of 10) on the 
assumption that differences in ratings were due 
to differences between raters rather than em- 
ployees. The further assumption was made 
that a rater who differed from others in leniency 
or variability on one trait would differ similarly 
on other traits. These assumptions appeared 
tenable in this situation. 


Results 


Intercorrelations of ratings expressed in 
standard score form are given in Table 1. 
They range from .33 to .84 with a median of 








C. E. Jurgensen 


Table 1 
Reliability and Intercorrelations of Ratings-Standard Score Form (N = 199 Conversion Employees) 








Atti- 
tudes 


Work 


Habits by Others 


Acceptance 


Self Mental 
Control Ability Ability 


Potenti- 
ality 


Overall 
Rating 


Physical 





es 
10.1 


Mean 
Sigma 


Work Habits 55 

Attitudes 66 54 
Acceptance by Others Al 33 
Self Control 68 68 
Mental Ability 65 A 
Physical Ability 53 52 
Overall Rating B84 62 
Potentiality 63 55 


49.9 
10.9 


50.2 
9.2 


Median 65 OO 


.). These same intercorrelations expressed 
in original raw score ratings range from .60 to 
.88 with a median of .76. Each of the 28 inter- 
correlations dropped in size when ratings were 
expressed in standard score form, the range of 
drop being from .01 to .44 with a median of .16. 

Reliabilities of ratings in standard score 
form are also given in Table 1. They range 
from .45 to .68 with a median of .55. Relia- 
bilities computed from original raw score 
ratings range from .68 to .75 with a median of 
.73, Each of the eight reliabilities dropped in 
size when ratings were expressed in standard 
score form, the range in drop being from .07 to 
.25 with a median of .18. 


Summary 


The attempt to develop a six trait scale in 
which each trait consisted of a cluster of sub- 
traits did not result in a scale wherein the 
major traits were independent of each other. 
Reliabilities and intercorrelations of these 
traits were essentially of the magnitude usually 
reported for more restricted traits. The first 
hypothesis was therefore refuted. 

Inasmuch as intercorrelations between nar- 
rowly restricted traits tend to be high (often as 
high as their reliabilities) and inasmuch as the 
same situation has been found to exist with 
regard to broadly worded factors consisting of 
clustered sub-traits; it would therefore appear 
unnecessary and unwise in many rating situa- 
tions to obtain trait ratings. This would be 
the case when trait ratings (weighted or un- 


_tercorrelations dropped 


if any value as discrete traits. 


46.6 
11.0 


47.7 
11.0 


48.4 49.3 49.1 
93 8.6 8.7 


53 

66 59 67 
74 * 58 
63 a 62 





weighted) are summed to obtain a composite 
score which is to be used for any purpose. It 
is simpler, more direct, and equally effective to 
obtain an overall rating instead of a composite 
based on highly correlated trait ratings. This 
does not necessarily deny all value of trait 
ratings. Overall ratings may be more valid 
or reliable if made after consideration has been 
given to traits, even though trait intercorrela- 
tions are high. Second, even though intercor- 
relations are high, there will be some few indi- 
viduals who are rated high on one trait and 
low on another. In such cases the trait 
ratings can be helpful. This appears particu- 
larly true when a primary purpose of ratings is 
to serve as a basis for a supervisor-employee 
conference on employee progress. 

When converted to standard score form, in- 
in magnitude but 
reliabilities dropped equally as much. There- 
fore the various trait ratings remained of little 
The consistent 
drop in reliabilities which occurred when 
original ratings were converted to standard 
scores indicates that reliability coefficients may 
be spuriously high if obtained from ratings not 
equated for differences in mean and sigma from 
one rater to another. The second hypothesis 
was therefore confirmed. Unless an investi- 
gator gives evidence that he has considered and 
eliminated the effects of any such differences, 
his reliability coefficients should be questioned 
to the point of rejection. 


Received October 24, 1949. 





243 


Intercorrelations in Merit Rating Traits 


References 4. Guilford, J. P. Psychometric methods. New York: 
1. Bolanovich, O. J. Statistical analysis of an indus ip Sear en ae oA prea 
trial rating chart. . Ps ., 1946, 30, . as te ’ 
0 ts eas allies getonpaeat ant rt different traits. Personnel J., 1927, 5, 440-446. 
2. Driver, R.S. A case history in merit rating. Per- © Mahler, W. R. Some common errors in employee 
sonnel, 1940, 16, 137-162. merit rating practices. Personnel J., 1947, 26, 
3. Ewart, E., Seashore, S. E., and Tiffin, J. A factor 68-74. 
analysis of an industrial merit rating scale. J. 7. Paterson, D. G. Principles of merit rating. Per- 
appl. Psychol., 1941, 25, 481-486. sonnel Digest, 1944, 1, 16-20. 








How Readable Are Corporate Annual Reports? * 
Siroon Pashalian 


New York University 


and 


William J. E. Crissy 


Queens College 


Business enterprises have long been con- 
cerned with communication problems. ‘Today 
there is increasing interest in how best to “‘get 
the word around” to jobholders, shareholders, 
customers, and the general public. One of 
the first formal media of communication to be 
used was the corporate annual report. In 
recent years an extensive literature has de- 
veloped concerning how best to construct and 
publish such reports (1, 2, 7,8). Most of the 
papers have reflected judgments of varying 
degrees of expertness rather than findings based 
upon experimental research. 

A current problem in the construction of the 
annual report is that of endeavoring to make 
it more understandable and more widely read. 
Readership surveys show a considerable apathy 
to company reports. Yet, it is by no means 
an easy task to prepare adequate and concise 
reports, 
sufficient 


The challenge is one of presenting 
technical data and information for 
the financial expert, satisfying the require- 
ments of the law (especially in railroad and 
public utility reporting), and at the same time, 
being meaningful to those whose interests are 
of a more general nature. The present de- 
mand crystallizes as the need for writing an 
informative account of the year’s operations, 


and for presenting the material in such a way 
that the report will be read. 

In connection with this problem,:one of the 
writers undertook to investigate the readability 


* This paper is a report on some of the findings from 
Pashalian’s M.A. thesis entitled, “An Investigation of 
the Application of the New Flesch Readability For 
mulas to Corporate Annual Reports,” submitted to the 
Department of Psychology, Graduate School of Arts 
and Science at New York University, October 1949 
Explorative findings concerned with the relationship 
between Flesch indices and judges’ ratings of reada 
bility have not been reported due to basic design 
limitations, 


244 


of corporate annual reports by means of the 


new “Flesch Readability Formulas” (5, 5a). 


Method 


The annual reports of those corporations 
that are listed in the Corporate Billion-Dollar 
Club in the June 11, 1949 issue of Business 
Week were included in the present study. 
These members are either non-financial com- 
panies with assets of over $1-billion, or with 
annual revenues or sales of over $1-billion, or 
both. Presumably then, these corporations 
have the largest number of stockholders, em- 
ployees and other persons interested in their 
operations. In other words, there is substan- 
tial public interest in these big corporations; 
their annual reports are expected to reach a 
vast audience. 

Applying the sampling technique suggested 
by Flesch (3, 4, 5), one-hundred-word samples 
were chosen from every other page of each of 
these twenty-six reports. This procedure, 
with no restrictions on the number of samples 
to be taken per report, is believed to have 
achieved a fair sampling in proportion to the 
length and breadth of the report. A total of 
211 samples were examined; the 
number of samples per report was 8.1. 


average 


Results 


The findings on the application of the new 
Flesch Readability Formulas to the twenty-six 
annual reports are listed in Table 1. 


Analysis of the Reading Ease Measures 


The range of readability scores was from 6 
to 58. According to Flesch reference cate- 
gories for these scores (5, 5a), these reports 
vary within descriptive styles of very difficult, 





How Readable Are Corporate Annual Reports? 


as with material of scientific and professional 
journals, to fairly difficult, as in literary and 
quality magazines, such as Harper's. ‘This 
range, interpreted in terms of the educational 


attainments of the 


education (4). 


U. S. adult population, 
suggests a potential audience of from 449% of 
the population completing college, to 40°% of 
the population who have had some high school 


The average Reading Ease score for the 


entire set of reports was 34.37, 


Writing at this 


level is generally difficult, and descriptive of the 


style 
Yale 
24% 
from 
training. 

The 


Summary of Re 


Industry and 
Average R.E 
Merchandise 
44.50 
Communications 
43.00 
Foods 


43.00 


Autos & Accessories 
40.00 

Oil 
33.50 


Utilities 
31.50 

Railroads 
28.50 


Machinery & Supplies 
26.00 

Metals & Chemicals 
25.67 


measure of 


in academic material; for example, the 
Review, which may be comprehended by 
of the population who have graduated 
high school or have had some college 


average sentence length 


245 


ranged from 16—/faisly easy, typical of slick- 
fiction and understandable by 80% of the 
population, to 53—very difficull, above much 
scientific material and understood by approxi- 
mately 44% of the population. 

The measure of the average number of syl- 
lables per 100 words ranged from 156—fairly 
difficull, to 183—difficull. Flesch has advised 
that a comfortable text contains one and one- 
half times as many syllables as words (6). 
The preponderance of reports in the difficult 
category on this measure (22 of the 26 reports) 
is indicative of a high level of abstraction in 
the language of these reports. 

In relation to this syllable measure, the 
factor of numbers was encountered in the 
material of the annual reports. Under Flesch’s 
directions, numbers separated by space are 


Table 1 


-ading Ease and Human Interest Measures on the Twenty-six Annual Reports 


Corporate Annual 
Report 
Sears, Roebuck 
Montgomery Ward 
Bell Telephone 


Swift & Co 

Armour & Co 
Safeway Stores 
General Motors 
Chrysler Corp. 
Standard Oil (N. J.) 
Standard Oil (Ind.) 
Soecony-Vacuum 
Texas Co. 

Gulf Oil 

Standard Oil (Cal.) 
Consolidated Edison 
Commonwealth-Southern 
Pennsylvania 

NY Central 
Southern Pacific 
Santa Fe 

Baltimore & Ohio 
Union Pacific 
General Electric 


U.S. Steel 
E. I. du Pont 
Bethlehem Steel 


Average 
Sentence 
Length 


NNN = 
= =) = 


ty 


tmNN Nh 
ea a Es 


NMwMmw Nw Nw Nw 
i re 


— Aw 
2.7% NK 


w 
» 


Average 
Number 
Syllables 


Human 
Interest 
Score 


Percentage 
of Personal 
Words Sent 


47 0 f) 
43 0 0 
43 0 4 


Reading 
Fase 
Score 


58 
43 
28 
38 
42 
34 
35 
33 
32 
M 
32 
39 
24 
© 
34 


25 
44 
6 








246 


counted as words in any text; several and 
lengthy figures should be omitted from the 
syllable count. Instead, a corresponding num- 
ber of words to the number of figures omitted 
should be added, and their syllable totals 
added to those already counted (5). When 
applying these directions to the samples, the 
number of figures per 100 words was also re- 
corded. The average number of figures per 
100 words ranged from 2.30 to 8.80. The 
highest number of figures per 100 words ranged 
from 5 to 21. Thus, a surprisingly large num- 
ber of figures appeared in small samples of 
100 words. Their disastrous effect on the 
general reader not widely trained in numbers 
or mathematics invites speculation. It would 
seem, therefore, that greater care and attention 
should be given to determining best ways of 
presenting figures in such reports. Hundred- 
word samples that are crowded with 10 to 20 
lengthy figures should caution writers or 
editors, and suggest their more effective incor- 
poration in a table or chart. 

The significance of these results on reada- 
bility obtained by analysis utilizing the Flesch 
technique is perhaps best indicated by showing 
what would improve the “scores” made. A 
need is demonstrated for more effective use of 
punctuational devices. The semicolon, for in- 
stance, can be more widely used to shorten 
sentence lengths and at the same time, to 
maintain any indications that the words and 
information belong closely together. 

Similarly, the need for writing at a ‘less 
difficult level is indicated. A ‘writing down” 
would not necessarily be an under-estimation of 
intelligence. People whom corporations want 
to influence probably range from low normal 
intelligence to the superior. They have the 
capacity to grasp such concepts as gross sales, 
profits, etc. However, the vast audience 
which these reports reach is assumed by cor- 
porate reporters to possess far greater language 
facility than it does. The language of these 
reports, shown in terms of the education of the 
U. S. adult population above, is too difficult 
for the great bulk of this diversified readership 
to comprehend. Corporate writers are over- 
estimating the language experience of their 
potential audience—stockholders, employees 
and the general public. 


Siroon Pashelian and William J. E. Crissy 


Human Interest Value 


The range of Human Interest scores was 
from 0 to 19. Again, in terms of Flesch refer- 
ences (5, 5a), these styles are from dull, de- 
scriptive of the style in scientific journals, to 
mildly interesting, descriptive of trade maga- 
zines. The average Human Interest score for 
the entire set of reports was 4.27—dull. 

Thus, the coporate writing of these twenty- 
six annual reports is extremely low in human 
interest value, i.e., in the personal words and 
sentences which provoke and continue general 
reader interest, and help the reader to under- 
stand the text better. In an era which is 
fostering the teamwork and cooperation of 
stockholders and employees alike, the need for 
the stress on “‘we” and “our” and “us” is 
inescapable. “I” can serve to bridge the 
tremendous gap between the President and 
stockholders and employees. This set of re- 
ports used such words sparsely. Personal 
words help to convey the feeling in the material 
of having been written directly to the reader, 
whoever he may be. They can reflect the 
whole spirit and tone of the organization. 

In addition, corporate writing can direct 
greater attention to individual personalities. 
Although the sampling necessarily tapped only 
certain pages, extremely few samples men- 
tioned specific persons and their accomplish- 
ments. It appeared that much of this kind of 
information was confined to Employee Rela- 
tions headings or obituaries. People are in- 
terested in people, and they want to become 
better acquainted with the outstanding per- 
sonalities of the corporation. Yet, among the 
21,100 words sampled in this study, only ap- 
proximately twenty names were mentioned, 
and even these were noticeably concentrated 
within certain reports. 

Moreover, to enhance human interest value, 
there remains the need for the appropriate use 
of personal sentences—exclamations, questions 
and commands directly expressed to the reader. 
Only one report among the twenty-six possessed 
a sentence of this description in the sampling 
scheme—a question. More question-marks in 
annual reports can provoke thought, continue 
or revive reader interest. Similarly, direct 
commands are another interest-controlling 
device. Instead of the impersonal, “It will be 





How Readable Are Corporate Annual Reports? 


noted... ,” “Note . . .”’ can do much more 
to invoke the effort of a glance at the charts 
and an independent analysis. 


Sample Passages 


To illustrate the various levels of difficulty 
obtained by means of the new Flesch Read- 
ability Formulas, sample passages from the 
annual reports of these corporations which 
ranked twenty-sixth, thirteenth, and first in 
Reading Ease are furnished below: 

From the Union Pacific Railroad report: 


Capital Stock 


At the annual meeting of Union Pacific Rail- 
road Company stockholders held on May 11, 
1948, the Articles of Association were amended 
so that on July 1, 1948 the total number of 
authorized shares of preferred and common 
stocks of the Company were doubled (with no 
increase in the total aggregate par value 
thereof), and the then outstanding 995,431 
shares of $100 par value preferred stock became 
1,990,862 shares of $50 par value preferred 
stock, and the then outstanding 2,222,910 
shares of $100 par value common stock became 
4,445,820 shares of $50 par value common 
stock, each of the new $50 par value preferred 
and common shares being entitled to one vote 
at any meeting of stockholders. 


From the New York Central Railroad report: 


Dieselization is progressing 

Carrying forward our motive power moderni- 
zation, the Central and leased lines, together 
with two affiliates, the Pittsburgh & Lake Erie 
and the Indiana Harbor Belt Railroads, ordered 
in 1948 new Diesel-electric locomotives at a 
total cost of approximately $33,600,000. The 
bulk of these locomotives, on which deliveries 
will extend into 1950, are for road freight and 
for switching service. The Central's portion 
was about $24,790,000. 

Locomotives delivered during 1948 increased 
the Dieselized portion of the total road freight 
train mileage of the Central and leased lines to 
approximately 13.5 per cent by the end of the 
year. 


From the Swift and Company report: 
What Swift & Co. is Trying To Do 
The public rightly expects a business to 
accomplish certain desirable things. 
Who determines what is desirable in a free 
country? Not one man or a group of men. 
Each individual decides for himself whether he 


will buy from, sell to, work for, or invest in a 
company. 


247 


A decision to buy a product is a vote in its 
favor. The votes of millions of people may 
cause prices to go up or down. The results 
quickly tell what the public thinks desirable. 

Such economic democracy can thrive only in 
a certain climate—one in which prices are free 
and competitive, and business is spurred by 
the hope of profits and the fear of losses. 


Analysis by Industries 


The arrangement of the twenty-six reports 
by industry in Table I facilitates interesting 
and noteworthy comparisons. There seems 
to be a certain amount of homogeneity within 
industries on all the obtained measures of the 
Flesch formulas. At the same time, however, 
it must be remembered that the entire set of 
reports has demonstrated a great degree of 
homogeneity and narrow range under the 
Flesch technique. 

The reports of railroad companies have the 
greatest amount of variability on the measures 
employed. Their range in average sentence 
length is from 18 words (standard) to 53 words 
(very difficult). This observation seems to re- 
flect the great difficulty attached to railroad 
reporting due in part to legal specifications con- 
cerning content. Apparently, however, some 
railroad companies are fulfilling their legal and 
public obligations in a more effective manner 
of readability than others. 

Another striking inference may be obtained 
from Table 1. The arrangement by rank order 
in readability attainments under the formulas, 
parallels almost directly the degree of contact 
the companies have with the general public. 
Merchandising, Communications, Foods, Auto- 
mobiles and Accessories corporations cater to 
larger sections of the general public. Cor- 
porations dealing in Machinery and Supplies, 
Metals and Chemicals have a more restricted, 
less diversified market for their products (8). 
Since this observation is based on the small 
and variable groups of the study, however, 
more data and analysis are actually required 
for final proof. 

Nevertheless, such an arrangement is by no 
means unwarranted. It is generally accepted 
that the extent and character of the public 
interest should first be determined in the con- 
struction of the report. Then a corporate 
writer attempts to write to that audience. 
However, if this same trend had appeared 








248 


within lower degrees of difficulty, it could be 
considered a more legitimate consequence of 
the nature of the enterprise and the groups 
interested in its operations, At the same time, 
it would then meet the readability require- 
ments of these particular audiences. 


Summary 


1. Analysis of the readability of the twenty- 
six annual reports of corporations listed in the 
Billion-Dollar Club of Business Week, June 11, 
1949, by means of the new Flesch Readability 
Formulas, revealed that, on the whole, the 
general level of reading was difficul’, and the 
human interest value dull. 

2. These reports contain language which is 
beyond the language experience and fluent com- 
prehension of approximately 75% of the U.S. 
adult population. 

3. The Flesch technique demonstrates prom- 
ise as a method for indicating the difficult 
language elements in corporate reports. 

4. It also demonstrates promise as a method 
for spotting “impersonalness”’ in such writing. 


When, as in this study, the writing sample 
involves a problem of mass communication, 
the Flesch technique appears to be a reasonable 
instrument. It gauges the likelihood that 
these annual reports will convey their messages 
to most of their prospective readers. Wider 
application of the technique in the construction 


Siroon Pashalian and William J. E. Crissy 


of the annual report is recommended. Used in 
conjunction with the other types of practical 
hints in the literature, it can serve to strengthen 
the annual report as the most important single 
written communication between management 
and stockholders, employees, and the general 
public. 


Received October 7, 1949. 


References 


1. Dale, E. Preparation of company annual reports. 
Research Repert No. 10, American Management 
Association, New York, 1946, 104 pp. 

Doris, Lillian. Modern corporate 
York: Prentice-Hall, 1948 

3. Flesch, R. F. Marks of readable style; A study in 
adult education. New York: Bureau of Publi 
cations, Teachers College, Columbia University, 
1943. (Contributions to Education, No. 897 

Flesch, R. F. The art of plain talk. New York: 
Harper and Brothers, 1946. 

Flesch, R. F. A new readability yardstick 
appl. Psychol., 1948, 32, 221-233. 

Sa. Flesch, R. F. The art of readable writing. 
York: Harper and Brothers, 1949 

6. Flesch, R. F. Making the narrative readable 
Chapter 15 in Modern Corporate Reports by 
Lillian Doris, New York: Prentice-Hall, 1948. 

7. Gibson, W. B., ed. The annual report; A study of 
over 500 financial reports of leading American 
Business Institutions showing the present style 
trend and important physical characteristics 
Chillicothe, Ohio: Mead Corporation, Marketing 
Research Division, 1939 

8. McLaren, N. L. Annual reports to stockholders 
Their preparation and inter pretation. New York 
Ronald Press, 1947 


New 


reports 


New 





Rorschach Responses, Strong Blank Scales, and Job 
Satisfaction Among Policemen 


Solis L. Kates 


Michigan State College 


An individual with certain personality char- 
acteristics, just short of thoroughgoing psy- 
chotic disorganization, may be attracted to and 
satisfied with an occupation because his per- 
sonality traits are compatible with its demands. 
Certain personality traits, whether they con- 
tribute to a high degree of adjustment or 
maladjustment may be of importance for 
or lead to the development of interest and 
satisfaction in a particular occupation. For 
instance, many of the employed patients seen 
at a mental hygiene clinic gave Rorschach 
responses that were markedly abnormal yet 
they were able to carry on satisfactorily in 
their occupations. 

The behavior of an individual should be 
studied not only in relation to the total culture 
but with regard to his sub-cultural group and 
its demands upon him. When the personality 
tendencies of an individual are compatible 
with the demands of a particular sub-culture, 
probably little anxiety or self-dissatisfaction 
may be provoked. If these personality traits 
conflict with the sub-cultural demands, prob- 
ably much anxiety and_ self-dissatisfaction 
may ensue. Because occupational groups are 
readily available for study, they have been 
selected here as a particular type of sub-culture 
for scrutiny. This study proposes only to 
ascertain the personality traits that appear to 
be possessed by policemen. Many occupa- 
tions will have to be investigated to ascertain 
the personality attributes associated with mem- 
bership in the occupation. But occupational 
groups must be carefully selected and the sub- 
jects identified with respect to the homogeneity 
of their vocational interests. It is interesting 
to note that studies in the field of vocational 
choice and achievement have been suggested 
by Rapaport (9) as a method of validating the 
Rorschach Test. 

The hypothesis that a significant relation- 
ship exists between measured vocational in- 
terests and job satisfaction postulated by 


Strong (11) requires substantiation. An in- 
vestigation recently completed by the writer 
(3) did not demonstrate any significant rela- 
tionship between measured vocational interests 
and job satisfaction of routine office workers. 
More evidence is essential with respect to this 
hypothesis. 

Subjects: The subjects of this study were 
twenty-five New York City Patrolmen who 
volunteered while off duty for testing. All 
came from one precinct. Their average age 
was 32.8 years and their mean educational at- 
tainment was 12.2 years. Due to the small 
sample, the results discussed below must be 
considered as suggestive rather than con 
clusive. 

Method: The Rorschach Test was adminis- 
tered in groups of four or five and an individual 
inquiry was conducted with each subject. 
The subjects completed the Strong Vocational 
Interest Blank and a job satisfaction blank of 
the Hoppock type. The Rorschach responses 
were scored as suggested by Klopfer (4) and 
were evaluated by means of the Munroe In- 
spection List (6) which yielded a total score 
indicative of the degree of maladjustment. 
The subjects’ responses were appraised ac 
cording to the extent of their deviation from 
clinically established normal limits in terms 
of 28 Rorschach categories. 


Results 


The mean policemen interest score for the 
subjects was 40.6 and the standard deviation 
was 10.1. The difference between this mean 
interest score and that for Strong’s criterion 
group was significant at the one per cent level. 
However, when the scores were divided into 
two categories corresponding to A and B+ 
ratings and those B and below in ratings, this 
distribution was not significantly different from 
the expected distribution of scores according 
to Strong’s data (11). 


249 








Solis L. Kates 


Table 1 


Product-Moment Intercorrelation Among Police Interests, Job Satisfaction, Occupational Level, 
and Munroe Inspection Technique Scores 


Police 
Interests 


Police Interests - 
Job Satisfaction 35 
Occupational Level —.74°" 
Munroe Score 19 


* Significant at five per cent level. 
** Significant at one per cent level. 


The mean job satisfaction score of the police- 
men was 20,0, with a standard deviation of 3.2, 
while the mean Munroe Inspection score was 
11.9 and a standard deviation of 3.7. The 
mean occupational level score on the Strong 
blank for the policemen was 48.9, standard 
deviation 5.7, which mean was not significantly 
different from that of Strong’s criterion group. 

Table 1 lists the product-moment correlation 
coeflicients between the variables under con- 
sideration, namely, job satisfaction, occupa- 
tional level, measured police interests, and 
Munroe Inspection scores. 

The biserial correlation coefficients between 
some of the Rorschach response categories on 
the Munroe Inspection List and job sutisfac- 


Table 2 


Biserial Correlation Coefficients Between Munroe 
Inspection List Categories and Police 
Interests, Job Satisfaction, and 
Occtipational Level 


Occupa- 
tional 
Level 


Job 
Satis- 
faction 


Police 
Interests 


Rorschach 
Score 

Color: Movement 53" 03 — 47* 
CF:FC -.25 — .02 12 
FC 31 — .03 61* 
FM:M Al 00 00 
M 61" 00 52* 
Total Color 55* Pg 58* 
Fc -.18 00 25 
F% 25 20 30 
Popular — 51° 29 
Dd ~.18 29 10 
W:M - 6" 
W:M(M>W) 


* Significant biserialcorrelation’coefficients. 


Munroe 
Scores 


Job 


} Occupational 
Satisfaction 


Level 


35 —.74** 19 
. —.51°° Aj* 
— — .26 

.26 - 


tion, police interests, and occupational level 
are listed in Table 2. 

In Table 3, the respective quantities of the 
whole (W) and human movement (M) responses 
are shown that have been arbitrarily considered 
as the optimum ratios. If the number of whole 
(W) responses was smaller than its lower limits 
for the specific number of human movement 
(M) responses given in the protocol, the ratio 
was considered to be excessively in favor of the 
human movement (M) responses. If the num- 
ber of whole (W) responses was greater than 
its upper limits in relation to the human move- 
ment (M) responses then the ratio was con- 
sidered to demonstrate an excessive prepon- 
derance of whole (W) responses. 


Discussion 


The mean score of the subjects was signiti- 
cantly lower than that of Strong’s criterion 
group, indicating that many subjects possessed 


Table 3 


Arbitrary Optimum Numbers of Whole (W) Responses 
in Relation to Human Movement 
(M) Responses 


Arbitrary Optimum 
Numbers of Whole 
(W) Responses 
2to4 
3to8 
5 to 12 
6 to 15 
8 to 15 
9 to 15 
10 to 16 
11 to 18 


Numbers of Human 
Movement (M) 
Responses 





Rorschach Responses, Strong Blank Scales, and Job Satisfaction 


extremely low measured police interests. On 
the other hand, the fact that the proportion 
of A and B+ interest ratings was as expected 
in terms of Strong’s criterion group illustrated 
that letter ratings were probably of greater 
accuracy than the magnitude of interest scores 
in forecasting the composition of individuals 
within an occupational group. Furthermore, 
letter ratings should be given more value than 
the magnitude of standard scores in guiding 
individuals. Finally, it may be concluded 
that many subjects entered police work be- 
cause of their genuine interest while the 
others had different reasons. 

The degree of job satisfaction was signifi- 
cantly greater than that of routine office clerks 
(3), significantly lower than that of nursing 
students (7), and slightly but insignificantly 
lower than that of engineering students (1). 
There was probably greater opportunity for 
policemen to express their abilities and pre- 
dilections in the pursuit of their duties than 
existed for routine clerks. The factor of 
greater remuneration of the policemen was an 
important element that could not be evaluated. 

The degree of maladjustment of the police- 
men, as measured by the Rorschach Test was 
slightly but insignificantly smaller than that of 
routine clerks (3) and slightly but insiynifi- 
cantly greater than that of biologists (10). 
Probably, the policemen, as a whole, demon- 
strated as many signs of maladjustment as may 
be found in other groups. 

No significant relationship was found to 
exist between the degree of measured police 
interests and satisfaction with police work. 
Consequently, the police interest scale of the 
Strong Blank cannot be used with any degree 
of certainty in forecasting satisfaction with 
police work. One of the reasons for this ab- 
sence of relationship may be assigned to the 
lack of precision of the measuring instruments. 
Moreover, Strong in setting up his occupa- 
tional scales, did not attempt to differentiate 
the satisfied from the dissatisfied members of 
the occupation. It was unwarranted to assume 
that those with the higher interest scores 
would be more satisfied than those with the 
lower scores. Similarly, since there was no 
significant relationship between job satisfaction 
and police interest ratings when the latter were 
separated into A and B+, and B and lower 


251 


ratings, letter ratings may not be utilized as a 
predictor of work satisfaction. 

Another reason for this absence of a signifi- 
cant relationship between job satisfaction and 
measured interests was the fact, as Strong 
indicated, that for some occupations, the weed- 
ing out process is not thoroughgoing and the 
elimination of candidates is not in terms of 
their personal attributes. Hence, the members 
of such occupations may not be homogeneous. 
What would satisfy one need not satisfy an- 
other member of the occupation. The more 
homogeneous the members, the more will 
similar occupational pursuits satisfy the mem- 
bers. The greater the homogeneity of mem- 
bers of an occupation, the greater and more 
significant will be the relationship between 
measured interests and job satisfaction. The 
less homogeneous the members, the smaller 
will be the relationship between measured in- 
erests and job satisfaction. Policemen may 
not be as homogeneous as, for example, phy- 
sicians and engineers and hence the relationship 
between their measured interests and job 
satisfaction may not be as great as that be- 
tween physicians and engineers. Further- 
more, in vocations where personal skills are 
required and the tasks are challenging, it is 
probable that the correspondence of measured 
interests to those of other successful members 
of the occupation would be significantly associ- 
ated with work satisfaction. [A more extensive 
treatment of this hypothesis can be found in 
the monograph by Kates (3). ] 

Finally, the unlikely possibility must be 
kept in mind that measured interests may 
change with experience on the job. Probably, 
the stability of measured interests goes hand in 
hand with the homogeneity of members of an 
occupation by being related as well to the 
extent and character of the elimination process. 
Where the weeding out process is extensive and 
is based on personal attributes, the measured 
interests of the members of the occupation 
should be quite stable. Where the elimination 
process is not so extensive and is not based on 
personal attributes, measured interests may be 
subject to change by further experience on 
the job. 

The significant negative relationship between 
the occupational level and police interests is 
quite similar to that found by Strong. For 








252 


those policemen with high measured interests, 
the occupational level, probably indicative of 
vocational aspiration level (2), was low. 

The significant negative relationship between 
job satisfaction and occupational level lent 
further support to the acceptance of the occu- 
pational level scale as an indicator of the level 
of aspiration in vocational endeavor. The 
policemen who had high occupational levels of 
aspiration were prone to be dissatisfied with 
their jobs. It illustrates the fact that in police 
work, the occupational level is a more potent 
contributor to job satisfaction and dissatis- 
faction than measured police interests. Prob- 
ably, it may be stated that, for policemen, job 
satisfaction is not as intimately linked with 
measured police interests as it is related to 
their occupational levels of aspiration. The 
subjective reactions of the policemen in terms 
of their aspiration levels, as measured by the 
occupational level scores, were of greater im- 
portance to their job satisfaction than the mere 
possession of high police interests. The fact 
that the member may be similar in his meas- 
ured interests to other successful members of 
his occupation may not be as important to his 
satisfaction as the fact that he has not attained 


to a level of vocational achievement that he 


believes necessary for his success. This con- 
clusion is consistent with the experiments of 
Lewin (5) demonstrating that the level of as- 
piration is a more vital factor for satisfaction 
with achievement than is objective perform- 
ance. Probably in the higher status occupa- 
tions such as physician and lawyer, there 
would be a positive relationship between occu- 
pational level and job satisfaction, 

The more maladjusted policemen, as meas- 
ured by the Rorschach Test and Munroe In- 
spection Technique (assuming that the Munroe 
is a valid criterion of maladjustment for N.Y.C. 
policemen), tended to be more satisfied with 
their work than the less maladjusted. Hence, 
this finding corroborated the hypothesis that 
the individual’s maladjustments might con- 
tribute to his job satisfaction rather than to 
his job dissatisfaction. It controverted the 
thesis that most job dissatisfaction stemmed 
from the individual's maladjustment. 

It was ascertained that policemen with high 
measured police interests could be or could not 
be maladjusted. Similarly, high occupational 


Solis L. Kates 


levels of aspiration were not associated with 
maladjustment. In police work, it may not 
be concluded that the individuals with the 
greater degree of vocational ambition tended 
to be more maladjusted. 

High police interest scores were associated 
with the tendency of policemen to give signifi- 
cantly more movement (M, FM, m) responses 
than color responses (FC, CF, C) to the 
Rorschach cards. They were prone primarily 
to be motivated by and responsive to inner 
promptings rather than to stimuli from without. 

High police interests were related to the 
tendency for policemen to give an adequate 
number of human movement (M) responses of 
reasonably accurate form. Low police inter- 
ests were associated with the giving by police- 
men of poor human movement (M) responses 
or two and less human movement (M) re- 
sponses. It may be said that the policemen 
who were prone to accept their inner prompt- 
ings as constructive and positive forces would 
have high police interests. 

The higher the police interests, the greater 
was the tendency for the subjects to give two 
and less color (FC, CF, C) responses. Police- 
men with high police interests did not possess 
the general readiness to establish emotional 
relationships with the world around them. 

In addition, the higher the job satisfaction 
score, the greater the tendency of policemen to 
give two and less color (FC, CF, C) responses. 
Satisfied policemen were not prone to establish 
a ready emotional relationship with the world 
around them. It is probable that policemen 
who tend to establish good social relations 
would not be happy on their jobs. Subjects 
who as a matter of practice did not establish 
good social relations would probably find their 
conditions of work rewarding. 

High job satisfaction was related to the 
giving by policemen of three and less popular 
responses. The more satisfied policemen were 
not likely to think along conventional lines. 

No other significant relationships were dis- 
covered between the Munroe Inspection List 
categories and police interests and job satis- 
faction. 

The higher the occupational level of aspira- 
tion, the greater the tendency for policemen to 
give total color (FC, CF, C) responses that 
were equal to or greater than total movement 





Rorschach Responses, Strong Blank Scales, and Job Satisfaction 


(M, FM, m) responses. Higher occupational 
levels of aspirations were associated with the 
policemen’s responsiveness to the environment 
about them. The same trend was evident in 
the finding that high occupational levels were 
related to the giving of an adequate num- 
ber of form color (FC) responses. Rational 
emotional reactions to stimuli by policemen 
were associated with high occupational levels 
ofaspiration. Finally, high occupational levels 
of aspiration were given by policemen who had 
an adequate number of color (FC, CF, C) re- 
sponses demonstrating further that the ability 
to be responsive to environmental stimuli went 
hand in hand with high vocational aspiration. 
It does sound reasonable and valid to have 
policemen with high occupational levels of 
aspiration display characteristics that are rec- 
ognized as necessary to achieve higher economic 
status. In another study, a definite relation- 
ship has been demonstrated to exist between 
great energy and activity in out-of-school and 
co-curricular activities of high school boys, and 
high occupational level scores (8). Since re- 
sponsiveness to outside stimuli is self-evident 
as a trait for carrying on co-curricular and out- 
of-school activities, one of the results of this 
study is to furnish additional evidence vali- 
dating the meaning of the Rorschach color 
responses. 

Low occupational levels of aspiration were 
associated with adequate human movement 
(M) responses of policemen. High occupa- 
tional levels were related to less adequate 
numbers of human movement (M) responses 
and poorly seen human movement (M) re- 
sponses. The more nearly the ratio, whole 
(W) responses to human movement (M) re- 
sponses, approached an arbitrary optimum, the 
lower the occupational level of policemen. 
When the number of human movement (M) 
responses in relation to whole (W) responses 
was in excess of the arbitrary optimum, the 
occupational level of the policemen tended to 
be high. 

It may be concluded that when policemen 
had an adequate number of human movement 
(M) responses in terms of both the Rorschach 
protocol and whole (W) responses, they were 
inclined to accept their inner promptings and 
their present status in life. They were not 
prone to aspire to higher occupational levels as 


253 
‘ 
they had probably accepted as suitable their 
own outlook on life as well as the realization 
that their attainments were consonant with 
their ability. Policemen with imadequate 
human movement (M) responses tended not to 
accept their status in life, and appeared to have 
a need to aspire to higher occupational levels. 
The same conclusion applied to a preponder- 
ance of human movement (M) responses in 
relation to whole (W) responses with the prob- 
able additional meaning that the excessive 
inner strivings within the policemen drove 
them toward higher occupational levels. 


Summary 


The subjects of this study, New York City 
policemen, were found: (a) to demonstrate no 
significant difference from Strong’s criterion 
group in measured police interest ratings, (b) 
to have a relatively high job satisfaction level, 
and (c) to be no more maladjusted than routine 
office clerks and biologists. 

No relationship was discovered between the 
subjects’ measured interests and job satis- 
faction, between measured interests and Ror- 
schach maladjustment, and between occupa- 
tional level and Rorschach maladjustment. 
Significant relationships were found to exist 
between measured police interests and occupa- 
tional level, between job satisfaction and 
occupational level, and between job satis- 
faction and Rorschach maladjustment. In 
these conclusions, the assumption was made 
that the Munroe Inspection Technique is a 
valid criterion of maladjustment for New York 
City policemen. Several reasons were ad- 
vanced for the presence and absence of the 
above relationships. 

Additional support has been received from 
the results of this study to validate the meaning 
of the color (FC, CF, C) responses. The 
human movement (M) responses have been 
assessed in a different context and additional 
meanings have been attached to the relations 
of the human movement (M) response to the 
whole (W) response and to the total Rorschach 
protocol. 

Investigations of policemen who are under- 
going treatment as a consequence of emotional 
maladjustment should be of help in ascertain- 
ing whether the hypothesis obtains, namely 








254 


that the personality traits of the particular sub- 
cultural group with which the patient was 
identified, should be known to evaluate his 
relative degree of maladjustment. 

Finally, policemen with high police interests 
tended to be markedly introversive, to have 
adequate ability to accept their own strivings 
and outlook as mature, and to be relatively 
unresponsive to stimuli from without. Satis- 
fied policemen tended to be relatively unre- 
sponsive to stimuli from without and to be 
lacking in the capacity to think along conven- 
tional lines. Further studies of the personality 
of policemen, if they confirm the above results, 
will be instrumental in helping to validate the 
significance of some Rorschach responses. 


Received October 17, 1949. 


References 


1. Berdie, R. E, The prediction of college achieve 


J. appl. Psychol, VAI, 


ment and satisfaction. 
25, 197-204. 
2. Darley, J. G. Clinical aspects and inter pretation of 


Solis L. 


Kates 


the Strong V ocational Interest Blank. 
The Psychological Corporation, 1941. 

. Kates, S.L. Rorschach responses related to voca- 
tional interests and job satisfaction. Psychol. 
Monogr., 1950, 64, 3 (Whole No. 309). 

. Klopfer, B., and Kelly, D. The Rorschach tech- 
nique. New York: World Book Co., 1942. 

. Lewin, K., Dembo, T., Festinger, L., and Sears, 
P. S. Level of aspiration. In J. McV. Hunt 
(Ed.) Personality and the behavior disorders. 
New York: Ronald Press, 1944, Vol. I, 333- 
378. 

. Munroe, Ruth L. The inspection technique: a 
method of rapid evaluation of the Rorschach 
protocol. Rorschach Res. Exch., 1944, 8, 46-70. 

Nahm, Helen. Satisfaction with nursing. J. appl 
Psychol., 1948, 32, 335-343. 

. Ostrom, S. R. The OL key of the Strong Test 
and drive at the twelfth grade level. J. appl. 
Psychol., 1949, 33, 240-248. 

. Rapaport, D., Gill, M., and Shafer, R. Diagnostic 
psychological testing. Chicago: Year Book Pub- 
lishers, Inc., 1946. Vol. IL. 

. Roe, Anne. Analysis of group Rorschachs of biolo- 
gists. Rorschach Res. Exch., 1949, 13, 25-43. 
Strong, E. K., Jr. Vocational interests of men and 
women. Stanford, California: Stanford U. Press, 

1943. 


New York 





Card Versus Booklet Forms of the MMPI 
Wm. C. Cottle 


The University of Kansas 


The purpose of this article is to compare the 
card versus the booklet form of the Minnesota 
Multiphasic Personality Inventory (MMPI) 
to determine whether it would be possible to 
use these forms interchangeably in a testing 
program. Is it possible to consider them as 
comparable forms of the test? 

Wiener (1), alternating the card and booklet 
form in chronological order with each new 
counselee at the St. Paul Guidance Center, 
found that “there are no differences between 
the individual and group forms on any scales 
that approach statistical significance.” 

Holzberg and Allessi (4) using 30 psychiatric 
patients tested within one or two days on the 
full and abbreviated form of the MMPI, report 
that “although statistically significant differ- 
ences were found between the mean weighted 
scores of half the scales, these results were not 
clinically significant as judged by profile 
results.” 


The Sample 


A brief description of the sample of 100 cases 
included here is given in Table 1. It will be 
noted that the educational level of the group is 
relatively high since all are college students, 
with approximately half the group under- 
graduates and the rest graduate students in the 
School of Education at the University of 
Kansas. Modal school grade attained by this 
group was first year of graduate study. 

The group volunteered to take both forms 
of the test and no pressure was exerted to 
secure compliance.' As evidence of this only 
100 of the proposed group of 175 completed 
both forms of the test within the one week 
time-limit imposed. 

Age of the group ranged from 18 to 60 with 
a median age for the 68 males of 27.8 years 
and for the 32 females of 21.5 years. 

' Acknowledgment is made to Professors E. T. Gaston, 
A. H. Turney, J. F. Nickerson, G. M. Carney, and 


J. O. Powell of the University of Kansas for making it 
possible for their students to take these tests. 


An undergraduate group of music education 
students contained 23 males and 19 females. 
In a subgroup of music education teachers 
taking graduate work there were 15 males and 
four females. In a subgroup of miscellaneous 
teachers and public school administrators there 
were 30 males and nine females. These com- 
prise the total group of 68 males and 32 females. 


Table 1 
The Sample 


Age 
18-20 
21-25 
26-30 
31-35 
30-40 
41-45 
46-50 
51-55 
56-60 


Total 
B. School 
Grade 

13 

14 

15 

17 

18 

19 


Total 32 
Occupation Male Female 
Mus. Ed. Student 23 19 
Mus. Tchr. 15 4 
Misc. Tchr. w 9 


Total 68 32 
Card Form 
Taken 
Female 
First : 20 
Last ‘ 12 


Total 32 


255 


oat ak aoa 2 ee 








Wm. C. Cottle 


Table 2 
Means and Standard Deviations of Card os. 





Scales 
68 Males 
Card 
68 Males 
Booklet 


32 Females 
Card 


32 Females 
Booklet 


100 Cases 
Card 


K F D 


272 
2.34 


3.56 


Hs 


3.88 
3.57 





3.94 


3.99 
3.62 


18.54 
4.0 


2.44 
2.09 


5.25, 
4.78 


17.87 
4.16 


15.97 
4.94 


15.53 
5.01 


5.13 
5.00 


18.63 
5.06 


ee ee | 


1.31 


2.54 
1.55 


" 


4.32 
4.06 


17.17 
4.04 


100 Cases 2 2.73 
Booklet « 1.28 


15.86 
4.64 


435 18.57 


4.82 


Procedure 


It was intended that one-half of the males 
and females would take the card form first 
and the booklet form second within one week. 
The other half would take the booklet form 
first and the card form second within a week. 
The booklet form was administered in a group 
situation and the card form was given indi- 
vidually. In actuality, as shown in Part D of 
Table 1, the planned procedure was achieved 
with the male group, but not with the female 
group. Twenty females took the card form 
first and only 12 females took the card form 
last. 


Book 


16.34 


let Form of MMPI for 100 College Students 








Ma 


15.97 
5.63 


Mf 


24.91 
5.40 


Pa 
7.88 
2.72 


8.84 


~ 
2.7 


Pt 


8.91 
6.9 


Se 
7.82 
6.46 


8.82 
7.13 





26.00 
5.62 


10.18 
7.81 


15.97 
5.31 


36.09 
5.41 


8.22 
2.53 


8.72 
2.68 


11.41 
7.13 


11.09 
8.16 


10.16 
7.22 


14.65 
4.63 


37.47 
4.4 


28.49 
7.50 


29.67 8.80 
5S4..: 2.43 


9.09 
8.50 


15.93 
4.38 


7.9 
2.67 


9.71 
7.13 


10.47 8.91 
7.94 7.00 


8.57 
6.80 


15.07 
5.02 


15.94 

4.70 

Raw scores without the “K’”’ correction were 
recorded for each individual on each form and 
Pearson product moment correlation coeffi- 
cients were computed for males, females and 
the entire group on each scale. These are 
shown in Table 3. 

Means and standard deviations for the en- 
tire group and separate groups of 68 males and 
32 females are shown in Table 2. The profile 
of mean scores for males is shown in Figure 1, 
and that for females is shown in Figure 2. 

In order to examine the magnitude of the 
standard error of estimate where card and 
booklet forms are assumed to be equivalent 


M, 








T Score 
segeseeeasae 


| 
} | 


‘. — ! 


en Cn oe 
| } j 
a. o & 


| 





a 
M P. 


,» & MM, 


Mean T-scores on the MMPI for 68 college males. 





Card Versus Booklet Forms of the MMPI 


Mm Fm FP S M, 





| | |teels omeames | 1 


——— Can 


~— 


| 





T Score 





Bes e6823428 8 














Fic, 2. 


and to check the accuracy of the product 
moment correlations, a table of score devia- 
tions was constructed and correlation coefh- 
cients were computed by means of the ratio of 
the estimated true variance to the observed 
variance. In these computations variance 
error was computed as one-half the variance 
of the distribution of difference in raw score 
between the card and booklet form and ob- 
served variance was taken to be the average of 
the variance of the card and booklet form.* 

Table 3 shows for each scale the correlation 
coefficient secured in this manner (r’.5), the 
variance of the total distribution of differences 
in raw score between the two forms for each 
scale (V4), the average of the variance for both 
forms (Vo), the standard error of estimate in 
raw score points, the approximate 7-score 
change at one standard error, and the variance 
ratios of card and booklet forms. 


* The formulae used for these computations were: 


rn ECP] 


Where Vz is variance of difference between raw scores 
on card and booklet; i is an interval of 1 raw score point; 
N = 100; and fx is frequency of deviations per step 
interval. 


Where r’ is the correlation between card and booklet 
form; V, = V4/2, and V» is average variance of card 
and booklet form. 


Py 


M, P, P, s. M. 


Mean 7-scores on the MMPI for 32 college females. 


Results 


With the exception of the coefficients shown 
for the lie scale (L), depression (D), and 
paranoia (Pa), the product moment correlation 
coefficients reported in Table 3 range from .72 
to .91. Omitting the validity scales, those 
representing actual syndromes of maladijust- 
ment approach the size of accepted reliability 
coefficients. The most reliable scales would 
appear to be masculinity-femininity (Mf),* 
psychasthenia (Pt), and schizophrenia (Sc); 
all three above .85. 

Holzberg and Allessi (4) report test-retest 
coefficients on the short and long form of the 
MMPI taken within three days which range 
from .519 to .927. With the exception of hy 
pochondriasis (Hs), psychopathic deviate (Pd), 
and hypomania (Ma), these coefficients ranged 
between .72 and .93. The scales above .85 
were the lie scale (L), validity (F), hysteria 
(Hy), and schizophrenia (Sc). 

Rotter, in commenting upon the test-retest 
reliability of the card form of the MMPI, re- 
ports coefficients ranging .71 to .83 (2). Hath- 
away in personal correspondence with this 
writer indicates that test-retest coefficients 
ranging from .61 to .85 have been secured on 
seven scales using individual versus group form. 

* Reference to Table 3 shows that the separate coeffi 
cients for males and females on the M{ scale are both 
lower than the coefficient for the entire group on this 
scale. Combining these groups increases the range and 


consequently the size of the correlation coefficient for 
the total. 








258 


Hathaway and McKinley (3) report reliability 
coefficients for the card form as follows: 


D with 40 normal cases, r= .77+..044 

Pt with 47 normal cases, r= .744:.15 (test-retest) 

Pt with 200 normal cases, r= 914.07 (corr. split half) 
Hy with 47 normal cases, r= .57 (test-retest) 
Ma, number not given, r=.83 (test-retest) 

Pd with 47 normal cases, r= .71 (test-retest) 


Capwell (5) reports test-retest correlations 
on 85 public school girls ranging from .40 to .77, 
only three of which are above .72. She reports 
-similar correlations on 98 home school girls 
which range from .33 to .71. The difficulty 
in comparing those studies to the present study 
is that only the one by Holzberg and Allessi 
employs a similar short time between tests. 
That is, it is difficult to estimate how much the 
reliability coefficient is affected by personality 
changes occuring in the length of time between 
tests. 

It would appear that, with the exceptions 
noted above for L, D, and Pa, correlation 
coefficients between card and booklet forms of 
the MMPI reported here in Table 3 are as high 
as those reliability coefficients reported for the 
card form alone. This would suggest that the 
group or booklet form of the MMPI could be 
used interchangeably with the card form for a 
college group. Saving in time required to ad- 
minister and score the booklet form make this 
preferable in many psychometric situations. 

Wiener (1), reports that on seven of the 


Wm. C. Cottle 


scales there is a slightly higher average score on 
the individual form than on the group form. 
Reference to Table 2 indicates that males 
score slightly higher on the booklet (group) 
form for all scales, except Ma, where mean 
scores are identical. 

Mean scores for females are slightly higher 
for the booklet form also, except for Hs, Pt, 
and Sc. The combined mean scores for the 
100 cases (P<.01) are all slightly higher for 
the booklet form, It would appear that there 
is a tendency for a college population to score 
slightly higher on the booklet than on the card 
form of the MMPI. A college population 
tends to place fewer items in the “Cannot Say” 
category when the booklet form is used. 

These results do not necessarily conflict with 
the research of Wiener, because his group was 
a non-disabled male veteran population in a 
guidance center operated by the Department 
of Education at St. Paul, Minnesota. 

Let us consider the results shown in Table 3. 
It is evident there is no significant difference 
in the result of the two methods of computing 
the correlation coefficients between card and 


Ve 
booklet form. The J is the best estimate 


of the standard error of the estimate, e.g., one- 
half the variance of the raw score difference 
between card and booklet forms on the Sc 
scale is 7.76 and the standard error of the 
estimate is calculated as 2.78. This means 


Table 3 


Correlation Coefficients and Related Estimates for Card vs. Booklet Form 
of the MMPI with 100 College Students 


Hs 


rep Males 72 
r» Females 1 
r+» (Product 

Moment) 100 
-~V./Ve 100 
2° 100 
+ V,/2* 100 
(Raw Score 
Points) 


BI 
8i 
3.26 
17.05 


a=1 


e= Vu 
V, 


Vo= 
s 


2.25 1.80 
Approx. T-score 

change at 1S_E 3 4 ’ 4 
F = Variance Ratio 1.45 117 1.39 1.05 


D y Pd 


1.42 


Mi Pa 
81 BS 6S 


9 8: 79 79 63 


66 80 1 56 


80 1 57 
4.06 5.19 
19.85 


2.01 


5 5 


1.00 1.10 


* Rounded from four to two places after computations were completed. 





Card Versus Booklet Forms of the MM PI 


that in two out of three administrations, the 
raw score earned on one form will be within 
+2.78 of the score that would be secured had 
the alternate form been given. This in sum 
is the meaning of the correlation coefficient 
(r’.») of .85 derived for the Sc scale. 

The next to last row of Table 3 indicates the 
approximate change in 7-score points for each 
shift of one standard error. Using the Sc scale 
again as an example, a change in raw score 
points of this magnitude results in a change 
of approximately four-tenths of a standard 
deviation. Such a change could be significant. 

Profiles of means shown in Figures 1 and 2 
indicate small T-score differences between card 
and booklet form. They could be considered 
normal profiles for a college group. 


Summary 

Correlation coefficients, means, and the 
standard deviations were computed for 68 
male and 32 female college undergraduate and 
graduate students to determine the equivalence 
of the card and booklet forms of the MMPI. 
With the exception of L, D, and Pa, these 
coefficients range between .72 and .91. This 


is as high as the majority of the reliability 


coefficients reported for the card form alone. 

Mean scores indicated in Table 2 suggest 
that a slightly more elevated profile would be 
secured on a college population by use of the 
booklet form. Fewer items are left un- 
answered on the booklet form. 

Inclusion of the standard error of estimate in 
raw score points and the approximate 7-score 
change at one standard error in Table 3 permit 


259 


one to judge whether he wishes to use these 
inventories as alternate forms. If these coefh- 
cients are interpreted as reliability indices, the 
standard error of estimate is as shown in Table 
3. If the negatively skewed distribution of 
errors is recalled, that is, the tendency to score 
slightly higher on the booklet form, one is 
forced to adjust individual diagnosis upward 
from card to booklet form or resign himself to 
an inequivalence of forms. 


Received October 21, 1949. 


References 


1. Wiener, D. N. Differences between the individual 
and group forms of the Minnesota Multiphasic 
Personality Inventory. J. consult. Psychol., 
1947, 11, 104-106. 

2. Buros, O. K. The third mental measurements year 
book. New Brunswick, Rutgers University 
Press, 1949. Comments of Julian B. Rotter on 
the Minnesota Multiphasic Personality Inven 
tory on p. 60. 

3. Hathaway, S. R., and J. C. McKinley. A Multi 
phasic Personality Schedule (Minnesota): I. 
Construction of the schedule. .’. Psyciol., 1940, 
10, 249-254; Il. A differential study of hypo 
chondriasis. J. Psychol., 1940, 10, 255-268; 
Ill. The measurement of symptomatic depres 
sion. J. Psychol., 1942, 14, 73-84; IV. Psychas 
thenia. J. appl. Psychol., 1942, 26, 614-624; 
and V. Hysteria, hypomania, and psychopathic 
deviate. J. appl. Psychol., 1944, 28, 153-174. 

4. Holzberg, J. D., and Allessi,S. Reliability of short 
ened MMPI. J. consult. Psychol., 1949, 13, 
288-292. 

5. Capwell, D. F. Personality patterns of adolescent 
girls. I. Girls who show improvement in LQ 
J. appl. Psychol., 1945, 29, 212-228; and I 
Delinquents and non-delinquents. J. appl 
Psychol., 1945, 29, 289-297. 








A Factor Analysis of MMPI and Aptitude Test Data * 


Lt. Comdr. Elisworth B. Cook (MSC) U.S.N. 
Tufts Medical School, Boston, Mass. 


and 
Robert J. Wherry 


Department of Psychology, Ohio State University 


This paper discusses the results of the ad- 
ministration of psychometric and psychomotor 
tests to a group of 120 naval enlisted submarine 
candidates. The study comprised one phase 


of an investigation of the possible value of a 
wide variety of measures for the selection of 
submarine personnel (1). 


Procedure 


Subjects were randomly selected (1) and two 
groups of six subjects each were tested weekly. 
The tests employed were: 


I. The Minnesota Multiphasic Personality 
Inventory: Hereinafter referred to as the 
MMPI, this test is designed to provide scores 
on all the more important phases of personality 
(2, 3, 4, 5, 6) and has been used extensively for 
the overall differentiation of normals from 
abnormals or persons predisposed to abnormal! 
developments (7, 8, 9, 10, 11, 12). The short 
group form was used. 

11. Two-Hand Coordination Test: This is a 
motor pursuit task which has been employed 
frequently in the selection of military personnel 
(14, 15, 16, 17, 18, 19). The essential psy- 
chological principle involves the carrying out 
of two coordinated movements simultaneously 
so that there is a conflict of attention. The 
subject is rated on his ability to manipulate 
hand cranks in such a way as to keep a small 
button in continuous contact with an irregu- 
larly moving disc. An electrically operated 
stop clock measures the total amount of time 
during which actual contact is maintained. 
[wo f-minute trials were used. 

* The study reported herein was conducted at the 
U.S.N. Medical Research Laboratory, U.S.N. Subma 
rine Base, New London, Connecticut, under BuM&S 
Research Project NM-003-017 

Opinions expressed are those of the authors and are 


not to be construed as necessarily reflecting the views 
or the endorsement of the Navy Department 


III. Basic Battery of Written Tests: This 
battery consisted of a test of arithmetical 
reasoning (fractions, percentages, proportions, 
etc), mechanical and electrical knowledge 
(picture identification tests), mechanical apti- 
tude (simple principles of physics—levers, 
pulleys, braces, etc.), and the General Classifi- 
cation Test (verbal abilities). These are 
standard Navy tests for enlisted men (20, 21). 
In order to qualify for Submarine School, en- 
listed candidates must have a combined score 
of 100 on the GCT and arithmetic tests (22). 

IV. Navy Enlisted Personal Inventory: This 
consisted of form 2 of the Personal Inventory 
(23), a group test which presents a standardized 
psychiatric interview in pencil and paper form. 
The forced-choice type items which comprise 
the inventory are based on case history dis- 
similarities between psychiatrically undesir- 
able and normal military personnel (24, 25, 26). 
Inasmuch as individual interviews must neces- 
sarily be brief during large scale selection 
programs, the P.I. serves as a rough screening 
device to guide the psychiatrist in orientating 
his interview. Scores on the two sections 
(personal history and medical history) were 
treated as separate variables. 

V. Tank Performance. Subjects were rated 
on a five-point scale by a submarine medical 
officer for their overall performance while un- 
dergoing routine training procedures at the 
Escape Training Tank. This device, em- 
ployed to acquaint personnel with the method 
of escaping from a submerged submarine, is 
a tower containing a column of fresh water 25’ 
in diameter and 100’ deep. A training bell 
permits an ascent from any desired depth, and 
hatches or locks are located at various points 
in the tower. Subjects made two underwater 
ascents from each of the 12’, 18’ and 50’ depths. 
They were rated on such items as evidence of 


260 





Factor Analysis of MMPI and A plitude Test Data 


Table 1 








Var. No. 





01 | Lie Score 


02 | F (Validity) Score 


Means and Standard Deviations of Variables Selected for Analysis 


Variable Description 





03 Hs (Hypochondriasis) Score 

04 | D (Depression) Score 

05 Minnesota Hy (Hysteria) Score 
Multiphasic; Pd (Psychopathic Deviate) Score 


P. I. 


Mf (Interest) Score 


| Pa (Paranoia) Score 


06 

07 

08 

09 | Pt (Psychasthenia) Score 
10 | Se (Schizophrenia) Score 
11 {Ma (Hypomania) Score 

12 Two Hand Coordination (C Score) 
13 | General Classification Test 
14 | Arithmetical Reasoning 

15 « Mechanica) Aptitude 

16 Mechanica] Knowledge 

17 | Electrical Knowledge 

18 Tank Performance Grade 
19 

20 


Navy 
Basic 
Battery 


Navy 
P. I. 


f Persona! History 
| Medical History 


44.78 
46.77 
$8.55 
11.75 
58.88 
$6.92 
$7.77 
$4.55 
54.92 

2.89 

1.18 

0.08 


8.23 
0.43 
1.46 
0.30 





apprehension, quickness of response to instruc- 
tions, errors of position on the line, “freezing” 
on the line, fighting to get out of the water too 
quickly, and so forth. 


Statistical Analysis and Results 


Data on 111 of the 120 subjects were utilized 
for statistical analysis. Three men were 
dropped because application of the standard 
criteria indicated that their MMPI scores were 
invalid.. Six others were excluded because 
records on them were incomplete. The 20 
variables selected for analysis are listed in 
Table 1, together with their means and stand- 
ard deviations. 

A comparison of mean scores for the 9 
personality scales with those obtained by 
roughly similar groups (27, 28) indicated that 
performance on the MMPI was typical of a 
young male adult population. 

The mean C score of 11.8 on two-hand co- 
ordination placed the group almost one stand- 
ard deviation above the mean of the sample on 
which this test was validated for naval use 
(29), while the standard deviation of 1.9 was 
comparable to that of the standardizing group 


(29). 


Scores were better than average on all the 
items of the Navy basic battery of written tests. 
This was to be expected since standards for 
the submarine service are higher than for the 
Navy generally (22). Mean scores closely 
approximated those of several hundred ex- 
perienced submariners who were reassigned to 
New London in the summer of 1945 (30). 

Mean scores were well below the established 
cut-offs for both sections of the Navy Personal 
Inventory. 

A tabulation of subject score range on each 
personality scale of the MMPI (Table 2), a 
graphical presentation of the MMPI profiles of 
individuals who scored 70 or above on two or 
more scales compared with the mean of the 
whole group (Figure 1), and a comparison of 
subject performance on two-hand coordination 
with several other submarine populations 
(Table 3) are available on request.' Space 
does not permit their inclusion here. 

' To reduce printing costs, Tables 2 and 3 and Fig- 
ure 1 have been deposited with the American Docu- 
mentation Institute. Order Document 2828 from 
American Documentation Institute, 1719 N Street, 
N.W., Washington 6, D. C., remitting $.50 for micro- 
film (images 1 inch high on standard 35 mm. motion 


picture film) or $.50 for photocopies (6 X 8 inches) 
readable without optical aid. 





spenpIsay 


Lae 
bi 

cL 
9 
st 





> 
: 
ez 
3 
s 
4 
3 
) 
e 
: 
S 


or 











SUOT} R]IIIONIIZUT 
$383] apmrandy pur Ajyeuossag jo suoTeTa02I9 UT 


FaqqeL 








Factor Analysis of MMPI and A plitude Test Data 


The intercorrelations of the 20 variables are 
presented in Table 4. A modified (31) Thur- 
stone Group Centroid (32) factor analysis 
yielded six independent factors to explain the 
intercorrelations obtained. The residuals aris- 
ing when one attempts to explain the inter- 
correlations on the basis of the factor loadings 
are also included in Table 4. The factor load- 
ings for the variables are given in Table 5. A 
factor loading represents the correlation be- 
tween a given measurement and one of the 
factors isolated. It may be positive or nega- 
tive depending on the nature of the relationship 
with the particular variable involved. The 
factor loading squared gives the percentage of 
score variance of a given measurement which 
may be explained or predicted by the factor in 
question. A loading of .20 or higher is re- 
garded as significant. The reader is reminded 
that the labelling of factors is a matter of inter- 
pretative judgment rather than a problem in 


263 


statistics, and that he is free to consider and 
suggest alternate designations. 

Factor A has high positive loadings on the 
validity (.64), hypochondriasis (.79), psychas- 
thenia (.72) and schizophrenia (.93) scales of 
the MMPI, and lower but still significant 
loadings on the depression (.28), hysteria (.28), 
psychopathic deviate (.33), masculinity-femin- 
inity interest (.33), paranoia (.25), and hy- 
pomania (.41) scales of the MMPI, as well as 
on the personal (.21) and medical (.38) history 
sections of the Navy Personal Invéntory. In 
general, then, it has significant projections on 
all items which measure neurotic tendencies, 
and is labelled tendency to personality malad- 
jusiment. The word “tendency” is employed 
to emphasize that the group was a normal one. 
Factor A appears comparable to the general 
factor ‘“‘maladjusted tendencies’’ isolated by 
Cottle in his study of the MMPI and the Bell 
Adjustment Inventory (33). 


Table 5 


Final Factor Loadings 


= 
o 
->p,F 
PS Z 
ea5 
oes 
was 
css 
Oe 


Variable Number and Description 


MMPI Lie Scale 

MMPI Validity Scale 

MMPI Hypochondriasis Scale 
MMPI Depression Scale 
MMPI Hysteria Scale 

MMPI Psychopathic Deviate Scale 33 
MMPI Interest Scale 33 
MMPI Paranoia Scale 25 
MMPI Psychasthenia Scale v2 
MMPI Schizophrenia Scale 
MMPI Hypomania Scale 
Two-Hand Coordination Test 
Genera! Classification Test 
Arithmetical Reasoning 
Mechanical Aptitude 
Mechanical Knowledge 
Electrical Knowledge 

Tank Performance Grade 
Persona! History 

Medica! History 


— 06 
4 
a9 
.28 
.28 


Al 
— .06 


Numerical- 
Verbal 


Interest Pattern 


Intelligence 
Tendency to 
Over- Activity 
Tendency to 
Paranoia 
Mechanical 
Coordination 
Tendency to 
Femininity of 
Communality 


FactorA Factor B FactorC Factor D Fact 


— 26 — 34 
— 03 O2 
— 19 — 23 
— O05 — 48 
— O08 — 43 

16 — 38 

16 
— Ol 

03 

19 

09 

00 


67 
— Al 


71 
24 
00 








264 


Factor B has high positive loadings on the 
GCT (.76) and arithmetic (.71) tests and a 
lower positive loading (.24) for mechanical 
aptitude. This factor appears indicative of 
the ability to follow directions, and akin to the 
trait measured by traditional intelligence tests. 
Accordingly it is designated numerical-verbal 
intelligence. The factor has a significant nega- 
tive loading (—.26) on the lie index of the 
MMPI, implying that persons who do well in 
intelligence tests tend to refrain from falsifying 
answers on personality tests. 

Factor C has its highest loading on the hy- 
pomania scale of the MMPI (.56) and signifi- 
cant positive loadings on mechanical aptitude 
(.34) and mechanical knowledge (.38) as well. 
This is a logical pattern in that overactive indi- 
viduals often find outlet in mechanical pursuits. 
The factor is called tendency to over-activity. 

Over-active persons possess a considerable 
degree of emotionality (as evidence the nega- 
tive loading of —.38 en the psychopathic 
deviate scale); this emotionality is shallow 
but varied. 

The factor has significant negative loadings 
on the “neurotic triad’’—the hyperchondriasis 
(—.23), depression (— .48) and hysteria (— .43) 
scales of the MMPI—indicating that indi- 
viduals high on this factor tend to lack self- 
consciousness and self-criticism and have a 
direct acceptance of the environment. This 
suggestion of a “recklessness pattern” among 
men interested in submarine duty is somewhat 
similar to the finding of an Air Forces study of 
the traits of fighter pilots (34). 

It is interesting to note that factor C has 
nearly zero loadings on the two-hand coordina- 
tion test, although one would normally expect 
a correspondence between mechanical aptitude 
and two-hand coordination. The over-pro- 
ductivity in thought and action is evidently 
sufficient here to cause an attempt to think 
ahead, to “beat” the gadget by anticipating 
its movements, and, actually, to result in poor 
coordination performance. 

The negative loading of —.19 on tank per- 
formance grade shown for this factor is worthy 
of mention, even though the loading is just 
under the established criterion (.20) of sig- 
nificance. Tank performance rating penalizes 
a man who “rushes” the line in an attempt to 
complete an ascent too quickly. Here again, 


Ellsworth B. Cook and Robert J. Wherry 


the element of impatience and impulsiveness 
appears. The finding is suggestive in view of 
a wartime service report (35) issued after a sub- 
marine crew had been subjected to long sub- 
mergence and heavy depth charging. In the 
colorful language of that report: “. . . when 
the long dive was over . . . the people who 
lasted out were those of a more phlegmatic 
disposition who didn’t bother much when 
things were running smoothly. The worriers 
and hurriers had all crapped out, leaving the 
plodders to bring home the ship’’ (35). 

Factor D is labelled tendency to paranoia 
from the loading of .48 on the paranoia scale 
of the MMPI. The high loading on the lie 
index of the MMPI is logical in that individuals 
tending toward that trait approach personality 
tests suspiciously, and are prepared to admit 
nothing which might show them in an un- 
favorable light. The loading of .38 on the 
interest scale suggests that the individuals 
high on factor D were the more effeminate 
members of the group. There is a negative not 
quite significant loading on GCT, suggesting 
that those who falsify on the lie questions of 
the MMPI do poorly on GCT. Thus, factors 
B and D give corroborative support to one 
another. There may well be an index of 
stupidity present here also, with the less in- 
telligent men falling more easily into the trap 
presented by the lie questions. 

Factor E has its highest loadings on electrical 
knowledge (.68), mechanical knowledge (.57), 
mechanical aptitude (.43), and two-hand coor- 
dination (.36), and accordingly it is designated 
as mechanical coordination. The factor has a 
significant positive loading also on the validity 
scale of the MMPI (.31) indicating that per- 
sons high in mechanical coordination were 
meticulous in answering the questions of the 
personality test. The negative loading on the 
interest scale (—.15), while not quite signifi- 
cant, implies that the more masculine members 
of the group were more proficient mechanically. 
Factor E indicates also that the expected cor- 
respondence between mechanical ability and 
two-hand coordination ts present when loadings 
on neurotic items are negligible, as is the 
case here. 

Factor F has positive significant loadings on 
the masculinity-femininity interest scale (.39), 
the psychasthenia scale (.47) and the personal 





Factor Analysis of MMPI and Aptitude Test Data 


history section of the P.I. (.40). The signifi- 
cant negative loading on mechanical aptitude 
(—.41) is taken to indicate that a man leaning 
toward the feminine side of the interest scale is 
likely to get a lower score in mechanical tasks 
than will a person whom this scale measures as 
more positively masculine in interests. This 
supports the Terman-Miles view that there is 
a pronounced relationship between masculinity 
and mechanical pursuits at every educational 
level (36), and Strong’s definition of mascu- 
linity scores as an interest in things or objects 
rather than in persons or personalities (37). 
The most likely designation for factor F ap- 
pears to be tendency to femininity of interest 
pattern. The high loading on psychasthenia 
shown for this factor suggests that the more 
effeminate man tends toward compulsive be- 
havior; this is consistent with the MMPI test 
development where this is regarded as more a 
feminine than a masculine trait (13). 

One purpose of each area study of the type 
reported herein is to select single or compositer 
measures, of the truly basic factors isolated, 
for relational analysis to significant components 
found in other areas (1). All six factors iso- 
lated in this area investigation will be repre- 
sented in the final matrix. 


Summary 


1. Men who do well on GCT and arithmetic 
tests tend to refrain from falsifying on the lie 
questions interspersed throughout the MMPI. 

2. When neurotic elements are absent, there 
is the expected correspondence between me- 
chanical ability and two-hand coordination. 
However, when such elements are present in 
the personality, they tend to make an indi- 
vidual rigid and confused with two-hand 
coordination. 

3. A man whom the MMPI estimates as 
tending toward femininity of interest pattern 
is likely to do less well on tests of mechanical 
ability than will one whom the Mf interest scale 
measures as more positively masculine. 

4. A trait suggestive of over-activity and 
recklessness was found in the personality 
pattern of some subjects. While alertness and 
daring are pre-requisites for successful sub- 
marine action, there is some evidence that 
over-active individuals may find it difficult to 


265 


tolerate prolonged submergence and confine- 
ment. Presumably the “hurrier” and the 
“plodder” both have place in the complete 
scheme, of underseas operation, with its long 
periods of monotony interspersed with mo- 
ments of intense activity. 

5. The evidence of a relationship between 
performance on intclligence and aptitude tests 
with personality traits as measured by the 
MMPI is considered worthy of note inasmuch 
as the minor personality accentuations found 
in this sample were within the generally accept- 
able ranges. 


Received September 26, 1949. 


References 


Cook, E. B., and Wherry, R. J. A study of the 
interrelationships of psychological and physio- 
logical measures on submarine enlisted candi- 
dates: I. History, experimental design and 
statistical treatment of data. Report No. 1 
BuM&S Research Project NM-003-017, U. S. 
Naval Medical Research Laboratory, U. S. 
Naval Submarine Base, New London, Conn., 
9 March 1949. 

. Hathaway, S. R., and McKinley, J. C. A multi- 
phasic personality schedule: I. Construction of 
the schedule. J. Psychol., 1940, 10, 249-254. 

. McKinley, J. C., and Hathaway, S. R. A multi- 
phasic personality schedule: II. A differential 
study of hypochondriasis. J. Psychol., 1942, 
10, 255-268. 

. Hathaway, S. R., and McKinley, J. C. A multi- 
phasic personality schedule: III. The measure- 
ment of symptomatic depression. J. Psychol., 
1942, 14, 73-84. 

McKinley, J. C., and Hathaway, S. R. A multi- 
phasic personality schedule: IV. Psychasthenia. 
J. appl. Psychol., 1942, 26, 614-624 

. McKinley, J. C., and Hathaway, S. R. A multi- 
phasic personality schedule: V. Hysteria, hypo- 
mania and psychopathic deviate. J. appl. Psy- 
chol., 1944, 28, 153-174 

. Morris, W. W. A preliminary evaluation of the 
Minnesota Multiphasic Personality Inventory 
J. clin. Psychol., 1947, 3, 370-374. 

Hunt, H. F., Carp, A., etal. A study of the differ 
ential diagnosis efficiency of the Minnesota 
Multiphasic Personality Inventory. J. consult 
Psychol., 1948, 12, 331-336 

Schiele, B. C., Baker, A. B., and Hathaway, S. R 
The Minnesota Multiphasic Personality Inven- 
tory. n.d. Departments of Neuropsychiatry 
and of Psychology, University of Minnesota 
Medical School 

Meehl, P. E. Profile analysis of the Minnesota 
Multiphasic Personality Inventory in differential 
diagnosis. J. appl. Psychol., 1946, 30, 517-524. 








Ellsworth B. Cook and Robert J. Wherry 


. Clark, J. H. Application of the MMPI in differ- 
entiating A.W.O.L. recidivists from non-recidi- 
vists. J. Psychol., 1948, 26, 229-234. 

. Abramson, H. A. The Minnesota personality test 

in relation to selection of specialized military 

personnel. Psychosom. Med., 1945, 7, 178-184. 

. Hathaway, S. R., and McKinley, J.C. Manual of 

the Minnesota Multiphasic Personality Inven- 

tory, Revised Egition. New York: The Psycho- 

logical Corporation, 1943. 

. McFarland, R. A., and Channell, R. C. A two 

hand coordination apparatus for appraising apti- 

tude for flying. Division of Research, C.A.A., 

Washington, D. C., March 1942. 

. Anon. The two-hand coordination test perform- 

ance of submarine men. Brown University, 

Providence, R. I. Report No. 5, project 44, 

Section D-4, NDRC, September 1942. 

. Graham, C. H., Riggs, L. A., Bartlett, N. R., et al 

A report of research on selection tests at the 

U. S. Submarine Base, New London, Conn. 

Brown University, Providence, R. 1. OSRD 

report No. 1770, project 44, Division 7, June 

1943. 

. McFarland, R. A., and Channell, R.C. A revised 

two-hand coordination test. Airman Develop- 

ment Division, C.A.A., Washington, D.C. Re- 

port No. 36, October 1944. 

. McFarland, R. A., and Franzen, R. The Pensacola 

study of naval aviators. Division of Research, 

C.A.A,, Washington, D. C. Report No. 38, 

November 1944. 

- NRC Comm. on Selection and Training of Aircraft 

Pilots. Report on the Boston-Midwest project 

Division of Research, C.A.A., Washington, D. C., 

Report No. 52, November 1945. 

. U. S. Navy. Arithmetical reasoning test and 

mechanical aptitude test. Bureau of Naval Per 

sonnel, Training Standards Section, Standards 
and Curriculum Division, Test and Research 

Unit. NavPers 16992, December 1944 (R). 

. U.S. Navy. Electrical knowledge test, mechanical 

knowledge test, general classification test. Bu- 

reau of Naval Personnel, Training Standards 

Section, Standards and Curriculum Division, 

Test and Kesearch Unit. NavPers. 16994, 

December 1944 (R). 

. Willmon, T. L. Outline and discussion of methods 
for selection of submarine reserve personnel. 
U.S. Nava! Medical Research Laboratory, U. S. 
Naval Submarine Base, New London, Conn. 
16 February 1948. 

U. S. Navy. Navy enlisted personal inventory, 
form 2, NavPers. 16845, IBM Form L.T\S. 
1100 A 1165 (R). 


24. Shipley, W. C., Gray, F., and Newbert, N. Stand- 


ardization and validation of the personal inven- 
tory: psychiatric criterion. OSRD report No. 
1606, Brown University, Providence, R.I. June 
1943. 


. Shipley, W. C., and Graham, C. H. Final report 


in summary of research on the personal inventory 
and other tests. Applied Psychology Panel, 
NDRC, Report No. 10, project N-113, August 
1944. 


. Kogan, L. S., Wantman, M. J., and Dunlap, J. W. 


Analysis of the personal history inventory. 
Division of Research, C.A.A., Washington, D. C. 
Report No. 42, February 1945. 


. Wiener, D. N. Differences between the individual 


and group forms of the MMPI. J. consult. 
Psychol., 1947, 11, 104-106. 


. Clark, J. H. Some MMPI correlates of color re- 


sponses in the group Rorschach. J. consult. 
Psychol., 1948, 12, 384-386. 


. Bartlett, N. R. Review of research and develop- 


ment in examination for submarine training 
1942-1945. Report No. 2, BuM&S Research 
Project NM-003-036, U. S. Naval Medical Re- 
search Laboratory, U.S. Naval Submarine Base, 
New London, Conn. (In preparation.) 


. Bartlett, N. R. Report on correlations of tests 


with grades in submarine school. Report No. 2, 
BuM&S Research Project X-243 (sub. 47), U.S. 
Naval Medical Research Laboratory, U.S. Naval 
Submarine Base, New London, Conn. 13 Feb- 
ruary 1945. 


. Wherry, R. H., Brogden, H. E., and Gaylord, R. H. 


Wherry-Brogden, Gaylord method of factor 
analysis. Personnel Research Section, Adjutant 
General’s Office, Department of the Army, 
Washington, D. C. (unpublished). 


. Thurstone, L. L. Mudliple factor analysis. Chi- 


cago: University of Chicago Press, 1947. 


. Cottle, W.C. A factorial study of selected instru- 


ments for measuring personality and interest. 
Guidance Bureau, University of Kansas, n.d. 

J. S. A. A. F. Psychological research on opera- 
tional training in the continental air forces: 
AAF Aviation Psychology Program, Report No. 
16, Washington, D. C.: U. S. Government Print- 
ing Office, 1947. 

1. $. Navy. Depth charging of the U.S.S. Puffer. 
Section 71 T of report, Enemy anti-submarine 
measures. n.d. 


. Terman, L. M., and Miles, C. C. Sex and per- 


sonality. New York: McGraw-Hill Co., 1936. 


. Strong, E. K., Jr. Vocational interests of men and 


women. Palo Alto: Stanford University Press, 
1943. 





A Combined Oral Reading and Psychogalvanic Response Technique 
for Investigating Certain Reading Abilities 
of College Students 


Homer L. J. Carter 


Western Michigan College 


The purpose of this study is to describe and 
evaluate a combined oral reading and psycho- 
galvanic response technique for investigating 
certain reading abilities of college students. 
It is postulated that the procedure may be of 
value r:ot only in determining such factors as 
reading rate, comprehension, and errors, but as 
a means of discovering how much the reading 
situation affects the individual. In therapy 
this is important for as Maier (4) has shown 
frustration state, not behavior symptoms, must 
be treated. In order to evaluate this pro- 
cedure, the materials and apparatus have been 
described, superior and inferior readers have 
been studied, resulting group data have been 
compared statistically, and inferences have 
been set forth tentatively. 


Materials and Apparatus 


(1) Gray Oral Reading Paragraphs Test (3). 
Six of the even numbered paragraphs were 
taken from the Gray Oral Reading Paragraphs 
Test. These paragraphs were typed on 3’ 5” 
cards, and on the back of each card five ques- 
tions were typed. It is assumed that the para- 
graphs given in the order 2, 4, 6, 8, 10, 12 con- 
stitute a scale of increasing difficulty but with 
steps of greater magnitude than that of the 
original scale from which the paragraphs were 
chosen. This modification of the Gray Oral 
Reading Paragraphs Test provides a record of 
reading rate, a comprehension score, and such 
reading errors as words aided, mispronounced, 
omitted, substituted, inserted and repeated. 

(2) Apparatus. The apparatus used through- 
out this experiment was a ‘“‘Maico Psychom- 
eter” (1). Change in palmar skin resistance 
(Delta R) can be measured in units ranging 
from zero to 100 and stated in ohms by merely 
multiplying the indicated unit on the scale 
by 200. In this experiment, only change in 
palmar skin resistance in response to given 


stimuli is considered and it has been assumed 
that the extent of the deflection of the galva- 
nometer is roughly proportionate to the in- 
tensity of the emotion or degree of frustration. 


Procedure 


Selection of Students. Twenty superior read- 
ers were selected from students scoring above 
the 75th percentile on Test III (reading) of 
the Ohio State Psychological Examination and 
twenty inferior readers were chosen from those 
scoring below the 25th percentile. Such 
factors as age, sex, and academic training were 
considered in making up the groups of superior 
and inferior readers. No attempt was made 
to control the factors of scholastic aptitude or 
general intelligence. First, second, third, and 
fourth year college students constituted both 
groups although the number of freshmen was 
equal to that of all upperclassmen. 

A pplication of Technique. As the apparatus 
was being applied, the examinee was given 
paragraph 2 and the following directions, 
“Read the paragraph on the card aloud, 
Avoid all reading errors. After you have 
finished, you will be asked questions concerning 
the material read.” The response to each 
paragraph was recorded in terms of number of 
errors, time required for reading, change in 
palmar skin resistance and comprehension 
score. A record of these data in the case of a 
freshman with a percentile of 8 on the reading 
section of the Ohio State Psychological Ex- 
amination is shown in Table 1. 


Resulis 


Data resulting from the administration of a 
combined oral reading and psychogalvanic re- 
sponse technique in the study of superior and 
inferior readers at the college level have been 
summarized as shown in Table 2. In deter- 


267 








Homer L. J. Carter 


Table 1 


Data Resulting from Application of Techn que 
in Individual Case 


Compre- 

hension 
(Weighted 
Score) 


Time 
in Sec 


Para- 
graph 
4 17 10 

6 19 16 

8 21 24 
30 32 

12 § 28 32 


Errors 


Delta 
R 


mining scores in comprehension, weighted 
values of 1 to 5 were assigned to questions on 
paragraphs 4 through 12, respectively. Be- 
cause paragraph 2 was used in preparation for 
the examination, data resulting from its use 
were not included. The mean and sigma were 
determined for each distribution and the 
standard error of the difference of the means 
of small samples was found (2). In de- 
termining whether or not the differences be- 
tween groups were significant, / was calculated 
for each difference. 
means in the tabulation of the total number of 
errors, average time and comprehension are 
significant at the 1 per cent level. The average 
change in palmar skin resistance as shown by 
superior and inferior readers as they read 
paragraphs 4, 6, 8, 10, and 12 is not statistically 
significant (¢=.66). However, on the more 
difficult paragraphs 8, 10, and 12 the difference 
in the means is statistically significant (= 1.80) 
at a point between the 5 and 10 per cent levels. 

An analysis of data resulting from this 
study shows three trends which may be signifi- 
cant. Eight good readers and thirteen poor 


Differences between the . 


readers show an increase in frustration as 
number of errors increased and a loss in com- 
prehension occurred. This suggests that more 
poor readers are frustrated by these reading 
disabilities than good readers. It is also ap- 
parent that 7 good readers and 6 poor readers 
show a decrease in frustration as errors in- 
creased and a loss in comprehension occurred. 
Consequently it may be that some individuals 
in both groups are not emotionally affected by 
their errors and inability to comprehend. 
Furthermore, 1 poor reader and 2 good readers 
who show increased frustration demonstrate 
decreases in both errors and in comprehension. 
This may indicate that in their cases emotional 
tension is due only to inability to understand 
what is read. 


Summary 


1. As generally expected such factors as 
number of errors, rate of reading and compre- 
hension scores differentiate significantly su- 
perior and inferior readers. 

2. Average change in palmar skin resistance 
cannot be expected to differentiate superior and 
inferior readers except as the materia! becomes 
comparatively more difficult. 

3. Nevertheless, this study is significant 
because it suggests the importance of apply- 
ing measures of frustration simultaneously 
with measurement of reading achievement. 
Consideration of frustration in human behavior 
is in keeping with the contributions of Maier 
in his studies of animal behavior. 

4. The technique described in this study 
provides objective data as to how much the 
accumulative effect of certain reading errors 
and inabilities affects the reader. This infor- 
mation may be of value not only in the diag- 


Table 2 


Data Resulting from Administration of Technique to Good and Poor Readers 


Good Readers 


Mean 


Total Number of Errors 4 
Average Time in Seconds 
Comprehension Score 
Average Change in Palmar Skin Resistance 
for Paragraphs 8, 10, 12 


~ 3.29 13 


Poor Readers 


Sigma Mean Sigma 
5.85 
4.73 


11.28 


2.30 
14.08 


23.7 
31.5 
39.5 


17.18 18.83 





Combined Oral Reading and Psychogalvanic Response Technique 


nosis of reading disability but in its correction 
As a result of this information, 
therapy can be directed toward the reduction 
of frustration, for example by providing easier 


as well. 


material, and not merely toward the eradica- 
tion of behavior symptoms such as rate, errors 
and comprehension. 


Received May 1, 1950. 
Early publication 


References 


1. Carter, H.L. J. A combined projective and psycho- 
galvanic response technique for investigating 
certain affective processes. J. consult, Psychol., 
1947, 11, 270-275 

2. Garrett, H. E. Statistics in psychology and educa- 
tion. New York: Longmans, Green, 1947. 

3. Gray, William S. Standardised oral reading para- 
graphs. Bloomington, Illinois: Public School 
Publishing Company. 

4. Maier, Norman R. F. Frustration. New York: 
McGraw-Hill Book Company, Inc., 1949. 











Geographical Sampling in Testing the Appeal of Radio Broadcasts 


John Gray Peatman 
City College of New York 


and 


Tore Hallonquist 
Columbia Broadcasting System 


For more than a decade hundreds of national 
and local radio programs broadcast over the 
outlets of the Columbia Broadcasting System 
have been studied with the Program Analyzer 
method for the purpose of determining their 
strong and weak points and improving them in 
the light of listeners’ reactions and comments. 
Until 1947 all of these tests were conducted in 
the New York studios and consequently the 
question arose as to whether audiences in other 
parts of the country would give reactions 
similar to those obtained from New York 
listeners. To get at least a preliminary answer 
to this question, CBS employed the Program 
Analyzer technique for eight weeks in Holly- 
wood and two weeks in Boston during the latter 
part of 1947, 

Thus, this article describes the results of a 
series of Program Analyzer tests with New 
York audiences, Los Angeles audiences, and 
Boston audiences. The New York and Los 
Angeles audiences were presented two network 
programs. One of these, a comedy-drama, 
originated in Hollywood and featured mainly 
West Coast talent. The other network pro- 
gram, a musical variety show, originated in 
New York and featured mainly East Coast 
talent. For the comparison of Boston and 
New York listeners an audience participation 
program /ocal to Boston was used. One might 
expect considerable differences among these 
samples of the radio audience because of local 
familiarities and appeals. The Hollywood 
program, for example, might be expected to 
have greater appeal on the West Coast and 
the New York program to have greater appeal 
on the East Coast. Similarly, one might 
expect the local Boston program to have a 
much stronger appeal for Boston audiences 
than for New York audiences. 


The Program Analyzer Technique 


Before describing the results of the above 
comparisons and considering the question 
whether the Program Analyzer technique has 
general or only limited usefulness in testing 
the appeal of radio programs, we shall briefly 
review the Program Analyzer technique, 
originally developed in 1937 by Frank Stanton, 
now President of the Columbia Broadcasting 
System, and Paul Lazarsfeld, now Chairman 
of the Department of Sociology at Columbia 
University. 

The Program Analyzer method brings a 
sample of listeners into direct contact with a 
radio or television program under scientifically 
controlled conditions and records the listeners’ 
reactions to each successive second of the 
broadcast. 

There are two versions of the Program 
Analyzer: “Little Annie” which has been con- 
tinuously in operation at CBS since 1940 and 
“Big Annie” developed by CBS engineers and 
used at CBS since 1944. 

“Little Annie”’ consists of a moving tape and 
a battery of 20 capillary pens. Ten of these 
pens draw continuous red lines which follow 
guide lines on the tape. The other ten pens 
draw lines in green. Each red and green pen 
is electrically connected with a red and green 
push-button in the Program Analyzer studio, 
so that when a button is pressed down the pen 
jogs off the guide line on the tape and stays 
off as long as the electrical circuit remains 
closed. Two “Little Annies” are used simul- 
taneously and can record reactions of 20 
listeners. 

Regularly, broadcast spot announcements 
invjte listeners to participate in a Program 
ps ae test. People who respond to these 
accouncements are classified on the basis of 


270 





Geographical Sampling in Testing Appeal of Radio Broadcasts 


sex, age, education, occupation, and avail- 
ability at specific times. This information is 
punched on IBM cards, one card for each 
person. Each time a test is scheduled, the 
cards are run through the sorting machine to 
yield a sample of listeners that is controlled 
with respect to such factors as sex, age, and 
education—all of which have been found rele- 
vant for stratification sampling of radio 
listeners. 

Listeners are invited in groups of from 10 to 
20 people. They are seated in the Program 
Analyzer studio, offered cigarettes and put at 
ease. Each person is given a red push-button 
for the left hand and a green button for the 
right hand. 

The group is usually given a pre-test ques- 
tionnaire containing questions about home 
listening habits, program preferences, attitudes, 
and so forth. A recording played over the 
loud-speaker system next informs the listeners 
about the test procedure. They are asked to 
listen closely to the program, to press the green 
button and keep it pressed down when they 
think a program part is good—when they want 
to listen to it; to press the red button when they 
think a program part is poor—when they don’t 
want to listen to it; to press neither button 
when they are indifferent to what they hear. 
The lights are dimmed and a recorded version 
of the program is then presented. One of the 
prime assets of the technique is that listeners, 
caught up in the momentum of the program, 
make spontaneous, non-reflective responses. 
The Program Analyzer, it may be said, pro- 
vides a psychological situation favorable for 
eliciting a true response, even from listeners 
who normally would not be able to articulate 
their reactions and ideas. Because the pro- 
cedure itself—the pressing of the buttons--is 
an extremely simple, spontaneous, and con- 
fidential procedure, the listener is almost irre- 
sistibly led on to take a stand in regard to the 
various aspects of the program. Once he has 
committed himself, oral articulation during 
subsequent interviewing is facilitated. 

The interviewer in charge starts.the record- 
ing apparatus in an adjoining room. As the 
program proceeds he notes down the positive 
and negative reactions of each listener, as 
indicated by the jogging pens. Thus, he 
knows at the conclusion of the show the 


271 


spontaneous reaction pattern of each subject. 
He goes back into the studio, the lights are 
turned on, the main questionnaire is distrib- 
uted, and the attempt made to elicit the 
listeners’ considered opinions about what they 
heard: their attitudes towards the program as a 
whole, what they liked and did not like about 
it, their reactions to specific elements, their 
opinions of the cast, etc. 

Finally, there is a period of oral interviewing. 
First calling on a listener whose reactions to the 
program were unfavorable, next calling on a 
listener whose reactions indicated satisfaction 
with the show, the interviewer encourages each 
person to talk informally about the program. 
Thus, two opposing points of view are estab- 
lished right at the start so that no listener need 
feel that his opinion is in the minority. Each 
participant is asked to tell how he felt about 
each aspect of the program and, in turn, is 
asked to give his conscious reasons for his 
reactions recorded on the tape. 

The questions and answers during the inter- 
view period are taken down in shorthand and 
transcribed. 

Thus, the Program Analyzer technique 
(“Little Annie”) yields three sets of data: (1) 
the second-by-second approval, disapproval, 
and indifference reactions of each listener as 
recorded on the Program Analyzer tape; (2) 
listener attitudes and opinions as expressed in 
writing in the test questionnaire; and (3) 
listener attitudes and opinions as expressed in 
the oral interview. 

The Program Analyzer technique and the 
type of information obtained when “Big 
Annie” is used are the same as for “Little 
Annie,” except that approval and disapproval 
reactions of the group as a whole are totalized 
instead of being differentiated for each listener. 
Furthermore, due to the size of the group 
present in the studio, it is not feasible to inter- 
view each subject at the end of the test session. 
Thus, the investigator relies mainly on the 
written questionnaire results for the verbalized 
opinions of the listeners. 


The Hollywood Program 


This national network comedy program 
originally developed by the CBS Network 
Program Department in Hollywood has main- 








John Gray Peatman and Tore Hallonquist 


Table 1 


Composition of the Two Samples for the Test 
of the Hollywood Program 


New York 
Sample 
N = 76 


Los Angeles 
Sample 
N = 95 

Sex: 

Male 42% 47% 
Female 58 53 
100% 

Age: 

Under 26 16% 
26-40 34 
Over 40 3: 50 


Education: 
Grammar School 
High School 40 
College 29 


100% 100% 


tained a very high listener appeal for a number 
of years as indicated by audience ratings. 

A broadcast of the program was tested simul- 
taneously with a New York sample of 76 
listeners and a Los Angeles sample of 95 
listeners. The Los Angeles sample was tested 
with “Big Annie’ and the New York sample 
with “Little Annie” in six successive sessions 
with approximately 12 subjects at each session. 
The composition of each sample, with respect 
to sex, age, and education is shown in Table 1. 
Some of the subjects were regular listeners to 
this program, others had never heard it. 

On the whole, the Los Angeles sample was 
more familiar with the program than the New 
York sample as shown in Table 2. The over- 
all reaction of the listeners to the program, as 
measured by the Program Analyzer scores,' 
was identical for both samples, i.e., both tests 
yielded an average attitude score of 32 which 
is well above average for a comedy program. 

But just as striking is the comparison of the 
minute by minute reactions of the two samples 

' This average attitude score takes into account the 
proportion of listeners with (a) positive reactions; 
(b) indifferent or no reactions; and (c) negative reac- 
tions. See Peatman and Hallonquist, The patterning 


of listener attitudes towards radio broadcasts, Stanford 
University Press, 1945, page 33. 


Table 2 


Prior Listening Habits to the Hollywood Program 


New 
York 


Los 
Angeles 
Sample Sample 
Regularly (almost every broadcast) 23% 26% 
Frequently (about every other week) 13 33 
Occasionally (once in a while) 33 24 
Never 31 17 


Listen to Program 


100% 


100% 


to the sequences of events on the broadcast. 
These reactions are portrayed in Figure 1. 
The trend of listener reactions of each sample 
is given by the heavy line graph on the two 
charts of Figure 1. This trend is measured by 
the Program Analyzer scores for successive 
units of the program. It will be observed that 
the trend lines for both samples are very 
similar, despite the fact that 31 per cent of the 
New York sample had not heard the program 
before. The Los Angeles sample responded 
more quickly in their approval of the program, 
a result probably attributable to the greater 
familiarity of the Los Angeles audience with 
the program and with its leading personality. 
The low points in each case are the commercials, 
a typical result in Program Analyzer tests of 
commercial programs. The over-all reaction 
of the listeners as obtained from the question- 
naire administered at the end of each test 
session is practically the same for both samples, 
as indicated in Table 3. Conditional listeners 
are those who would listen to future broadcasts 
if the programs were “improved a bit.” The 
main appeals of the program to both samples 
were very similar, as indicated in Table 4 which 
shows the relative appeal of the principal 


‘ 


Table 3 


Per Cent Satisfied, Conditional, and Dissatisfied 
Listeners to the Hollywood Program 








New York 
Sample 
Satistied listeners 53% 
Conditional listeners 27 


Los Angeles 
Sample 


Dissatisfied listeners 








Geographical Sampling in Testing Appeal of Radio Broadcasts 


Joi abd cae ied 9 a ie Biocon eae 


—- 


AND 


vecectwnt 


40 


. 


= 


Fic. 1. Comparison of Hollywood and New York listeners: Hollywood comedy drama. Top—Profile of 
listener reactions, Hollywood listeners. Bottom—Profile of listener reactions, New York listeners. 








John Gray Peatman and Tore Hallonquist 


Table 4 


The Relative Appeal of Program Elements of the 
Hollywood Program 


New York 
Sample 


Los Angeles 
Sample 
The gags and jokes 49% 49% 
Personalities and characters 37 38 
Story and plot 9 9 
Enjoyed none of these 5 4 


100% 


100% 


aspects of the broadcast. The story and plot 
had relatively the least appea! for both samples. 
It is to be emphasized that these figures de- 
scribe the relative appeal of the program ele- 
ments as obtained from the questionnaire 
which asked: “This is what I enjoyed most 
about the show’’:—The gags and jokes;—The 
story and plot;—The personalities and char- 
acters;—Enjoyed none of these. Each subject 
checked one answer to indicate which of the 
foregoing elements was most enjoyable. 

The subjects were also asked to indicate 
whether or not they were anxious to hear the 
outcome of the plot. Their replies are de- 
scribed in Table 5. 

Finally, the great similarity of the reactions 
of both samples to the broadcast is shown by 
Figure 2, which describes the comedy appeal 
of each of the characters on the program. 
Each percentage value is the per cent checking 
each character as “very funny.” It will be 
observed that the ranks for each of the five 
characters are the same for both samples even 
though there are some differences in the per- 
centage value for each. 

From the foregoing figures and tables we see 
a remarkable similarity of both the over-all 


Table 5 


Interest in the Outcome of the Story of the 
Hollywood Program 


New York 
Sample 
Anxious to hear outcome 72% 73% 
Not anxious 28 27 


Los Angeles 
Sample 


100% 100% 

































































Supporting Cherecters 


Fic. 2. Comedy appeal of program characters on 
Hollywood program (per cent checking each character 
as “Very Funny”’). 


and detailed reactions of two different, geo- 
graphically located samples hearing the same 
broadcast. Such identity of response certainly 
is not to be expected for all programs, particu- 
larly for programs that may have sectional or 
local flavor in their content. It is also to be 
noted that both samples were city samples and 
such similarity of response might not be ex- 


Table 6 


Composition of the Two Samples for the 
New York Variety Show 


New York 
Sample 
N = 71 


Los Angeles 
Sample 
N = 94 


Sex: 
Male 37% 
Female 63 
100% 
Age: 
Under 26 28% 
26-40 37 
Over 40 35 
100% 
Education: 
Grammar School 13% 
High School 69 
College 





Geographical Sampling in Testing Appeal of Radio Broadcasts 


Mt Sabai 











:. Smee 





| 
T 








| 
| ipeond tenses 
{|| 
| 


PATTING 





anod 



































PERCENT 











Comparison of reaction trends for New York and Hollywood listeners, 
New York musical variety show 


pected in the comparison of rural or smalltown 
subjects with those of urban dwellers. 


The New York Variety Show 


In the second radio program to be considered 
here, comparisons again were made between 
New York listeners and Los Angeles listeners. 
This time, the program originated in New York 
City and its chief talent had obtained millions 
of East Coast fans long before the West Coast 
ever heard him. However, as in the case of 
the Hollywood program, the New York show 
has also developed a great deal of national 


Table 7 
Prior Listening Habits to the New York Variety Show 
New Los 
York Angeles 
Sample Sample 
Regularly (almost every broadcast) 35% 24% 
Frequently (about every other week) 20 13 
Occasionally (once in a while) 28 40 
Never 17 23 


Listen to the Program 


100% 100% 


' 


appeal over the CBS network. A particular 
broadcast was tested simultaneously in New 
York and Los Angeles. The two samples con- 
sisted of 71 listeners in the New York test and 
94 listeners in the Los Angeles test. The com- 
position of the two samples is given in Table 6. 

As for previous familiarity with the program, 
we see in Table 7 that the situation is reversed 
from the Hollywood program samples: more of 
the Los Angeles sample had never heard the 
New York show before and a greater propor- 
tion of the New York sample listened regularly 
or at least “about every other week.” 

As in the preceding tests of the Hollywood 
program, “Big Annie’ was used for the Los 
Angeles sample whereas “Little Annie” was 
used in a series of seven successive sessions 
with the New York sample. The over-all 
results of the tests, as measured by the Program 
Analyzer score, were not too dissimilar. 

The reactions of the two samples to the 
successive program units of the broadcast are 
portrayed by the line graphs of Figure 3, each 
of which is based upon the average Program 
Analyzer scores for each program unit. The 
general trend of reactions for both samples 








John Gray Peatman and Tore Hallonquist 


Table 8 


Per Cent Satisfied, Conditional, and Dissatisfied 
Listeners to the New York Variety Show 





New York Los Angeles 
Sample Sample 
Satisfied listeners 50% 39% 
Conditional listeners 25 25 
Dissatisfied listeners 25 36 


100% 


100% 


is essentially the same, but the /evel of the trend 
for the Los Angeles audience is somewhat lower 
than that for the New York sample. The 
high points and the low points in nearly all 
cases are found to be practically identical. In 
other words, the implications of the results for 
these two samples are very similar in that any 
recommendations for the improvement of the 
program, based on the Program Analyzer tests, 
would necessarily point to similar aspects of 
the broadcast. 

Differences between the results of the two 
samples are brought out rather clearly in a 
comparison of the relative appeals for each of 
the program elements as shown in Table 9. 
The M.C. (Master of Ceremonies) has the 
most appeal for the New York sample whereas 
the musical numbers had relatively the greatest 
appeal for the Los Angeles sample. 

Table 5, for the Hollywood comedy program, 
compared the two samples on the basis of their 
interest in the outcome of the plot. In the 
case of the New York musical variety show, 
there is, of course, no plot, but a comparison 
can be made of the listeners’ judgment of the 
musical numbers and the comedy skit. Each 


Table 9 


The Relative Appeal of Program Elements 
New York Variety Show 
New York _—_Los Angeles 

Sample Sample 
M.C.’spersonalityandhumor 43% 28% 
The musical numbers 40 52 
Comedy dialogue 10 7 
Comedy skit 6 6 
Other appeals 1 7 


100% 


Table 10 


Performances Liked Most and Liked Least 





Los 
Angeles 
Sample 


New 
York 
Sample 





Performances liked best: 
No. 1 Third musical number 
No. 2 First musical number 
No. 3 Comedy skit 
No. 4 Second musical number 


Performances liked least: 
No. 1 Comedy skit 
No, 2 First musical number 13 
No. 3 Third musical number 14 14 
No. 4 Second musica] number 14 15 


100% 100% 


subject was asked which performance he liked 
best and which performance he liked least. 
The results are brought together in Table 10. 

It will be seen that, despite some differences 
in the trend of their over-all Program Analyzer 
reactions to the program and despite the differ- 
ence in their reactions to the M.C., the two 
samples were in close agreement with respect 
to the individual performances. As is evident 
from the data of Table 10, they ranked the 
entertainers they liked best in the same order 
and they ranked the entertainers they liked 
least also in practically the same order. 


The Boston Program 


This local daytime audience-participation 
show was developed by CBS’s owned and 
operated Boston station WEEI and had been 
heard for more than a year at the time the 
Program Analyzer tests were made. In the 
case of this comparison, therefore, the Boston 
sample was exposed to a program with which 
some were familiar and the New York sample 
was presented a program which none had 
heard previously. Both samples consisted of 
71 listeners: “Little Annie” was used for both 
tests in seven successive sessions with approxi- 
mately 10 subjects for each. The composition 
of the sample with respect to sex, age, and 
education was as shown in Table 11. 

Inasmuch as the New York listeners had not 





Geographical Sampling in Testing Appeal of Radio Broadcasts 


Table 11 


Composition of the Two Samples— 
for the Boston Program 
New York 
Sample 
N = 71 


Boston 
Sample 
N =71 
Sex: 
Male 17% 3% 
Female 83 97 
100% 
Age: 
Under 26 14% 
26-40 41 
Over 40 


Education: 
Grammar School 
High School 
College 17 


100% 


100% 


heard the program before, a somewhat different 
type of question was asked for comparative 
purposes, namely, “How do you feel about 
quiz programs broadcast in the daytime?” 

The Boston program featured two quizzes— 
a music quiz and a radio star quiz—as well as 
two contests in the nature of stunts—a dough- 
nut-dunking contest and a cake-slicing contest. 
A special feature of the program was a “‘travel- 
ling mike,” consisting of a search among the 
members of the studio audience for the woman 
with the most children. A total of 11 local 
contestants appeared on the program and were 
interviewed by the quizmaster prior to each 
contest. The prizes consisted of merchandise. 
A special program unit of the broadcast was 
the “Radio Mirror” plug—a reference to the 
fact that the program had received national 
recognition in the August issue of this maga- 
zine. The first half of this thirty-minute pro- 
gram was sponsored and included three com- 
mercials for a nationally advertised brand of 
bread. The second half was sustaining. 

The over-all reaction of listeners to the pro- 
gram, as measured by their Program Analyzer 
scores, was as follows: Boston listeners 33 (a 
very good rating for this type of show), New 
York listeners 25 (about average). The pro- 


277 


gram thus had a stronger over-all appeal for 
Bostonians than for New Yorkers which is not 
unexpected in view of the local flavor of some 
of the content and subject matter character- 
istic of the show, This is true not only in 
terms of the over-all reaction of the two 
samples but also in terms of their reactions 
to the individual program units described in 
Figure 4. ; 

The two trend lines are essentially parallel. 
The highspots and lowspots of the program are 
practically identical for both Boston and New 
York listeners. The musical quiz had the most 
appeal for both samples; the doughnut-dunking 
contest and the cake-slicing contest had the 
least appeal, aside from the commercials. The 
only difference of any consequence was in the 
relative appeal of the “plug” for Radio Mirror 
magazine which had some appeal for Boston 
listeners (local pride) but was of little interest 
to New York listeners. 

The over-all reaction of listeners to the 
program, obtained from the questionnaire ad- 
ministered at the end of the test sessions, 
further established a difference between the 
reactions of the two samples, as indicated in 
Table 13. 

A majority of the Boston sample was satis- 
fied with the program whereas a third of the 
New York listeners were dissatisfied. Despite 
this fact, however, the main appeals of the 
program were similar for both samples. This 
has been seen in Figure 3 and is further con- 
firmed in the questionnaire data summarized 
in Table 14. 

Self-testing, which has been found to be a 
principal appeal in most quiz programs, was 
checked by nearly four-fifths of both groups as 


Table 12 


Opinions about Day-Time Quiz Programs 
New York Boston 
Sample Sample 
Such programs among my favorites 54% 43% 
Like them as well as other daytime 
shows 31. 39 
Do not like them as well as other 
daytime shows or never listen to 
them 15 


100% 








John Gray Peatman and Tore Hallonquist 






































PERCENT AND @aTinG 




























































































Fic. 4. Comparison of reaction trends for Boston and New York listeners, 
Boston aucience participation program. 


having been one of the items adding most to 
their enjoyment. Self-testing rated the highest 
of any element for satisfied, conditional and 
dissatisfied listeners in both New York and 
Boston samples. More than two-thirds of 
each sample derived a great deal of enjoyment 
from “hearing people have a good time” (de- 
scribed as empathy in Table 16). On the 
other hand, the two aspects of the program that 
had the least appeal were the prizes and the 
contests. 

The brunt of the success of any audience 
participation show rests largely on the M.C, 


Table 13 


Per Cent Satisfied, Conditional, and Dissatisfied 
Listeners to the Boston Program 


New York 
Sample 
Satisfied 46% 51% 
Conditional 20 32 
Dissatisfied 34 17 


Boston 
Sample 


100% 


100% 


and it is interesting to note that the two 
samples reacted similarly to the M.C. of this 
program, although again the Boston audience 
reacted somewhat more favorably probably 
because of its greater familiarity with his per- 
sonality prior to the time of the Program 
Analyzer test. In Table 15, four aspects of 
the M.C. are considered; the per cent of each 
sample enjoying each is indicated. The M.C.’s 
humor and jokes were least appreciated, but 
even so, nearly three-fifths of the Boston 
audience found them ‘‘very enjoyable.” 


Table 14 


Gratifications and Appeals of the Boston Program 


New York 





Self-testing 
Empathy 
Human interest 
Information 
Humor 

Prizes 

Contests 





Geographical Sampling in Testing Appeal of Radio Broadcasts 


Table 15 


Favorable Opinions about Personality and Perform- 
ance of the Master of Ceremonies 


New York Boston 
Sample Sample 
.C.’s personality 67% 79% 
>.’s handling of contestants 65 79 
’.’s voice and manner of speaking 67 74 
’.’s humor and jokes 40 57 


Summary 


The results of the New York-Boston tests 
point to the advisability of testing “local” 
programs with local audiences if accuracy for 
general level of response and program appeal 
is the principal question. On the other hand, 
the program’s producer could ascertain the 
most important strong and weak points of the 
program just as well from the New York 
sample as from the local Boston sample, inas- 
much as the high- and low-spots tended to 
parallel each other up to the very closing of the 
broadcast. We are of the opinion that this 
would generally be the case except for programs 
that are purely local in their content, orienta- 
tion and atmosphere. Other Program Ana- 
lyzer tests have established, for example, that 


quiz questions of the local sort, that cannot be 
answered other than by local audiences, have 
little or no appeal for outsiders. If self-testing 
were not the main appeal of quiz programs, 
then this might not make such a difference, 
but self-testing has repeatedly been found to be 
the principal appeal. 

The results of the New York-Los Angeles 
tests demonstrate that national network pro- 
grams can be satisfactorily analyzed, at least 
for urban audiences, with sampies of subjects 
drawn from different geographic areas. We 
are of the opinion that the Program Analyzer 
technique has a genera! usefulness for the anal- 
ysis of such programs regardless from what 
urban area the sample may be drawn. Un- 
familiarity of listeners with the principal char- 
acter (or characters) of a program will tend 
to affect somewhat the general level of listeners’ 
likes and dislikes, but not decisively so, particu- 
larly since the major high- and low-spots of a 
program evidently will be the same. From 
the point of view of diagnosing the appeal of a 
program as a whole and its various parts, as 
well as making recommendations for the im- 
provement of future broadcasts, this is the 
primary consideration. 

Received A pril 1, 1950. 

Early publication. 











The Effect of Color in Direct Mail Advertising 


J. William Dunlap 
Harvard School of Public Health 


There have been a number of discussions 
concerning the value of color in direct mail 
advertising. Birren presents the results of 
several studies which indicate that color pulls 
more returns than black and white in this type 
of advertising.' The results of the several 
studies presented by Birren were, in some cases, 
slightly contradictory. Furthermore, perti- 
nent data needed for determining the statistical 
significance of the differences were not pre- 
sented. Among the colors Birren found to be 
best were: yellow, goldenrod, blue, and cherry- 
red. It was felt that a study should be con- 
ducted in which statistical tests could be 
applied to determine the significance of the 
results. 

The colors tested in the present study were: 
yellow, blue, and cherry with white as the base 
or control “color.’’ It was intended to use 


the psychological primary colors of red, blue, 
and yellow. 


However, since the true primary 
colors could not be obtained from the paper 
manufacturer, colors were chosen that were as 
close to the psychological primaries as was 
possible. The colors blue and yellow closely 
match the primaries, but the cherry is some- 
what different. 

It is necessary that the reader be given 
some idea of the relative “‘brightness contrast”’ 
between the print used and the color of the 
cards. Paterson and ‘Tinker stated that 
legibility and speed of reading depend upon the 
“brightness contrast” between the print used 
and the background for the print.2. The back- 
ground colors for the black print used in this 
study are all comparatively light. However, 
these colors approach maximum chroma, or 
saturation. There, is little difference in the 
“brightness contrast” between the white and 
yellow cards and between the blue and cherry 
cards. The contrast of the print on the blue 


'Faber Birren with color, New York: 
McGraw-Hill, 1945. 

*D. G. Paterson and M. A. Tinker. How to make 
type readable. New York: Harper and Brothers, 1940. 


Ch. 10, “Color of Print and Background,” pp. 118-129. 


Selling 


280 


and cherry cards, however, is not as great as 
the contrast between the print and the white 
and yellow cards. 

The nature of the ‘‘advertising”’ material was 
a card notifying members of the Kansas State 
Alumni Association that their annual member- 
ship was expiring and this was their oppor- 
tunity to renew it.* The cards were of three 
ply colored Hammermill Index Bristol. The 
dimensions of the cards were 3X35’. The 
message was printed in black on all cards, 
regardless of color. 

During the first week of each month, cards 
were sent to all members whose membership 
would expire that month. Every fourth 
person on the mailing list received the same 
color or card. No person received more than 
one card. 

Between the dates of Dec. 1, 1948 and August 
6, 1949, a total of 572 cards were mailed out. 
The distribution was as follows: 147 white, 
144 yellow, 141 blue and 140 cherry. Table 
1 shows the number of cards of each color 
that were sent out each month and the number 
returned. The highest percentage of returns, 
50.7, was for the yellow cards. The rank order 
of returns for the other colors was blue 46.1%, 
white 40.8% and cherry 38.6%. 

The question now is whether the observed 
differences in returns are due tochance. Under 
the null hypothesis it is assumed that no dif- 
ference exists due to the effect of color. A 
simple and direct overall test for this assump- 
tion is provided by chi square. Calculation 
gave a chi square of 2.9, which for 3 degrees of 
freedom is significant at about the P=.40 level. 
Thus the null hypothesis of homogeneity must 
be accepted, and it is concluded that no differ- 
ence in “pull’’ was found due to the use of 
color. 

This result does not support Birren’s con- 


§ This study could not have been conducted without 
the cooperation and support of Kenny L. Ford, Alumni 
Secretary, KSC. The writer is also indebted to Dr. 
Roy C. Langford for his suggestions and encouragement. 





Effect of Color in Direct Mail Advertising 


Table 1 


The Number of Cards by Color Sent Out and Returned by Mailings 


White 
Month | - 
Mailed Ret’d Sent 
Dec 8 7 
Jan. ) } 8 
Feb 17 
Mar 28 1 28 
Apr. 18 18 
May 25 25 
June 
July-Aug 32 ‘ 32 


Sent 


144 


50.7% 


147 oO 
40.8% 


Total 
% Ret'd 


tention that color affects “pull” in direct mail 
advertising. Birren reports a study done by 
a manufacturer, who tested colored envelopes 
for the number of responses and obtained the 
following results: blue 7.8%, yellow 6.8%, 
goldenrod 6.4%, green 6.0%, pink 5.8% and 
white 3.1%. He reported another study in 
which colors were measured in terms of per- 
centage of orders they produced. The results 
given were: goldenrod 21.42%, pink 17.83%, 
green 17.82%, white 17.29%, kraft 15.89% 
and an old envelope (color not stated) 9.75%. 
Birren reports still another study done by a 
milling company who, testing “return cards,” 
found that 50.6% of the returns were cherry- 


Yellow 


Ret'd 


Blue Cherry 


Ret'd Ret'd 
2 
3 
14 
15 
+) 


Sent 


65 


46.1% 38.6% 


red cards, while white and blue pulled 32.7% 
and 16.7%, respectively. 

The available evidence as to the effect of 
color in direct mail advertising is contradictory. 
It is possible that the results from the alumni 
membership cards represent a sampling error 
due to the content of the cards, the personnel 
sampled, or to the size of the sample. Ex- 
amination of all the data available gives the im- 
pression that black on yellow, buff, or golden- 
rod has a greater pull than does black on white. 
In view of the possible practical value of color 
in direct mail advertising the problem should 
be subjected to further investigation. 


Received September 22, 1949. 








Brand Discrimination among Cigarette Smokers 


C. K. Ramond, L. H. Rachal, and M. R. Marks 


Tulane University 


It is a matter of common knowledge that the 
essence of cigarette advertising is the claim 
that the particular cigarette is distinguishable 
from other brands. Habitual smokers fre- 
quently comment that they are able to identify 
their own brand. If these claims be true, it 
follows that there must be discriminable dif- 
ferences among brands, and the problem of 
ascertaining the extent of such differences is 
one of interest to psychologists. 

The writers have found only two studies 
which are immediately relevant. Husband 
and Godfrey (2) in 1934 worked with 5 differ- 
ent brands. They requested 51 Ss to attempt 
the identification of 4 cigarettes, under the 
condition that S was told only that his cigarette 
was included among the 4. The report does 
not state clearly whether the S knew just what 
brands were possible choices. Ss were blind- 
folded. The data are given in terms of per- 
centage of correct and incorrect identifications 
for each brand tested. Although no statistical 
techniques were employed to evaluate the data, 
Husband and Godfrey concluded that most 
cigarettes tested were identified correctly 
slightly more times than would be expected by 
chance. They noticed anomalous findings, 
e.g., “Camels are identified as Chesterfields 
more often than as themselves” (2, p. 222). 

There is evidence, however, that the use of 
a blindfold obscures the central problem. Hull 
(1) in 1924, while studying the physiological 
effects of tobacco smoking, found that his 
blindfolded Ss frequently could not distinguish 
between real tobacco smoke, and warm-moist 
air, when both were inhaled through a pipe 
mouthpiece. Those readers who are habitual 
smokers may recall that when they are smoking 
in the dark they are sometimes not sure of 
whether they are smoking at all! 

The present investigation was designed to 
test capacity for discrimination among various 
popular brands of cigarettes, to wit, Camels, 
Chesterfields and Lucky Strikes, when S was 
allowed to see both the cigarette and the smoke. 


282 


Answers for the following questions were 
sought: 


1. Do correct identifications exceed chance 
expectancy, and, if so, what is the margin of 
improvement over chance? 

2. Do Ss, who are permitted to smoke the 
brands interchangeably, make higher identifi- 
cation scores than Ss who are required to 
smoke a single brand until they commit them- 
selves as to its identity? 

3. Does an § who habitually smokes a given 
brand identify that brand correctly more often 
than do Ss who habitually smoke some other 
brand? 

Procedure 


Subjects: Only smokers who consumed at 
least one pack of cigarettes in a four day period 
participated in the study. They were 200 in 
number, of both sexes, accidentally sampled 
from Tulane University students. Ss were 
approached outside of classrooms, in student 
centers, and on streets near the campus. 

Apparatus: This consisted of mimeographed 
identification questionnaires, gummed labels, 
candy mints, and most important, 1,200 
cigarettes-—400 each of the brands named 
above. Plain gummed labels of identical size 
were pasted in identical positions over the 
brand names of the cigarettes used in the test- 
smoking. Each label bore an initial—X, Y, 
or Z. These non-committal identifying marks 
were coded to the brand names, and, for 
security purposes, the code was changed half- 
way through the study. The questionnaire 
was as follows: 

Circle the brand name which you think the 
labeled cigarette is: 


xX 
Camel 
Chesterfield 
Lucky Strike 


Y 
Camel 
Chesterfield 
Lucky Strike 


Z 
Camel 
Chesterfield 
Lucky Strike 


Method: Two principal groups of Ss were em- 
ployed. An attempt was made to have in 





Brand Discrimination among Cigarette Smokers 


each group Ss who habitually smoked each of 
the brands tested, and also, Ss who habitually 
smoked other brands. Actually, in Group I, 
there were 24 Ss who smoked Camels, 27 who 
smoked Chesterfields, 24 who smoked Lucky 
Strikes, and 25 smokers of miscellaneous 
brands. In Group II, the corresponding break- 
down was, 25 smokers for each of the categories 
described. 


Practice Smoking: Instructions given to each 
S in Group I were as follows: ‘Light all three 
of these cigarettes (one each of the three brands) 
and smoke them interchangeably. Notice the 
brand which you are smoking and look for 
characteristics which will help you identify it 
later. Smoke until you feel you will be able 
to make the identifications."" Instructions 
given each S in Group II were as follows: 
“Light only one. of these cigarettes. Notice 
its name and smoke it until you feel that you 
can identify it later. Then take another ciga- 
rette and proceed in the same way. Put out 
one cigarette before lighting another.” 

Test Smoking: Each S was given a candy 
mint to clear the taste of the practice smoking. 
The method paralleled that used in the prac- 
tice, but this time the names of the cigarettes 
were masked by the initialed gummed labels. 
The Group I Ss indicated their identifications 
on the questionnaires after smoking all three 
cigarettes interchangeably. The Group II Ss 
indicated a choice each time they completed 
one of the three cigarettes. In no case in- 
cluded in the data, did an S use a given brand 
name to identify more than one of the test 
cigarettes. None of the Ss was told of his 
degree of success. 


Results 


Table 1 shows the number of times each 
cigarette was identified as itself and as some 
other cigarette, by experimental groups. 

The first pertinent question is, ‘Do correct 
identifications exceed chance expectancy, and 
if so, what is the margin of improvement over 


Table 1 


Number of Identifications of Cigarette Brands by 
Brand Name and by Experimental Group 


Identified As 


Actual Camel Chesterfield Lucky Strike 
Brand I Il I II I 
Camel 48 40 24 33 26 
Chesterfield 23 35 46 «(Al 
Lucky Strike 30 24 


283 


chance?” The x* test of independence of 
principle of classification was applied to the 
data of Table 1. With 4 degrees of freedom, 
x’?= 32.75. A value as great as this could be 
expected by chance only 1 time in 100,000. 
The chance expectancy is of course 33.3%. 
The overall average percentage of correct 
identifications was 44.5. Thus, the margin 
of increase, while highly significant, is small 
in magnitude. The figures for individual 
cigarettes and individual groups do not differ 
significantly from the average figure of 44.5% 
correct identifications. Theactual percentages 
were: For Group I, Camels, 48; Chesterfields, 
40; Lucky Strikes, 41. For Group II, 40, 
41, and 50, respectively. It appears that, 
while no individual cigarette is more “‘distinc- 
tive’ than any other, there is a slight but 
significant discriminability among the three 
brands tested. 

The second question is, “Do Ss who are 
permitted to smoke the brands interchangeably 
make higher identification scores than Ss who 
are required to smoke a single brand until they 
commit themselves as to its identity?”’ This 
question may be answered by comparison of 
Groups I and If. Group I averaged 44% 
correct; Group II averaged 45% correct. The 
difference is not statistically significant. It 
appears that this limited practice is not effec- 
tive in differentiating between the groups. It 
is quite possible that the two kinds of practice 
did not change the scores at all, so that if a 
third group had been employed in which only 
the test smoking was administered the identifi- 
cation scores would have been as high as those 
obtained in this study. It shculd be noted 
that the Group II conditions approximate the 
“real life’ situation. Habitual smokers smoke 
only one cigarette at a time! 

The third question is, “Does an S who 
habitually smokes a given brand identify that 
brand correctly more often than do Ss who 
habitually smoke some other brand?” Of the 
Camel smokers, 75% were able to identify 
their own brand. Corresponding figures for 
Chesterfields and Lucky Strikes were 70% 
and 74%. ‘There is no significant difference 
among these percentages. Smokers of par- 
ticular brands tested (i.e., excluding the mis- 
cellaneous smokers) identified the popular 
brands other than their own less frequently 








284 


than did the habitua! smokers of those brands. 
Of the combined Chesterfield and Lucky Strike 
smokers, only 39% identified Camels correctly ; 
of the combined Camel and Lucky Strike 
smokers, only 42% identified Chesterfields 
correctly; of the combined Camel and Chester- 
field smokers, only 44% identified Lucky 
Strikes correctly. There are no significant dif- 
ferences among these percentages, but all of 
them are significantly less than the percentages 
of correct identifications of ‘“‘own’’ brands 
given above. Miscellaneous smokers did even 
less well. Camels were identified correctly by 
only 24%, Chesterfields by 14%, and Lucky 
Strikes by 22%. Again, there are no signifi- 
cant differences among these percentages, but 
all of them are significantly less than the per- 
centages for habitual smokers of the brands 
tested. The answer to the question, “Do 
smokers know their own brands?’’, is fairly 
clear. Although the overall percentage of 
correct identifications was 43.5, the “own 
brand” identifications averaged 73%; identifi- 
cations of popular brands by smokers of other 
popular brands averaged 42° ; identifications 
by smokers of miscellaneous brands averaged 
20%. There is thus a well-defined tendency 
for smokers of particular brands to know that 
brand. Smokers of miscellaneous brands score 
lower than could be expected by chance. They 
seem to have positive misinformation about the 
differential tastes of the popular brands. 


Summary 


One hundred Ss, each of whom customarily 
smoked at least one pack of cigarettes each 
four days, practice-smoked one each of three 
popular cigarettes—Camels, Chesterfields, and 
Lucky Strikes—until they felt that they could 
identify them. In practice and in test they 
were allowed to interchange the cigarettes, 
after lighting all three. Another 100 Ss, both 
in practice and in test, smoked only one of the 
brands until practice was over or identification 
made, then proceeded to the next, etc. In the 
test situations the cigarettes were masked by 
uniformly sized and positioned gummed labels. 
All Ss recorded on a questionnaire their identi- 
fications of the three brands. 

Inasmuch as the sampling was predominantly 
of college students, generalization from the 


C. K. Ramond, L. H. Rachal, and M. R. Marks 


results is applicable only to such a population 
except insofar as the sample tested reasonably 
represents the general smoking population. 
Further, in a short-run test such as this, such 
factors as throat-irritation or, contrariwise, 
acquired insensitization, are minimized. In a 
more extended smoking test the obtained fre- 
quencies of correct identifications might well 
change. With these restrictions in mind, the 
following conclusions were reached: 


1. All three brands tested were identified cor- 
rectly an average of 44% of the time as against 
a chance expectancy of 33.3%. The increase, 
though slight, is significant statistically. 

2. There were no significant differences in 
frequencies of correct identifications among 
the three brands tested. 

3. There was no significant difference in 
correct identifications between those Ss who 
smoked their cigarettes simultaneously and in- 
terchangeably, and those who were limited to 
one cigarette at a time. The latter group is 
that which corresponds more closely to actual 
smoking practice. It is suggested that the 
training used in this study was not sufficiently 
extensive or intensive to be evocative of dif- 
ferences in performance. 

4. Habitual smokers were able to identify 
their own brand significantly more times than 
smokers of other brands; this facility was 
uniform in habitual smokers of all three brands 
tested. 

5. It would appear that claims of cigarette 
advertisers and habitual smokers to the effect 
that there are discriminable differences among 
various brands are technically true, but 
actually of small magnitude. 

6. No data of this study indicated greater 
discriminability of any particular brand, or 
greater discriminatory capacity by smokers 
of any particular brand. 


Received October 20, 1949. 


References 


1. Hull, C. L. The influence of tobacco smoking on 
mental! and motor efficiency. Psychol. Monogr., 
1924, 33, 161 

2. Husband, R. W., and Godfrey, J. An experimental 
study of cigarette identification. J. appl. Psy- 
chol., 1934, 18, 220-223. 





Report on the Journal of Applied Psychology for 1949 


A summary of the materials published in 
Vol. 33 of the Journal of Applied Psychology is 
presented in the following table together with 
the corresponding data for Volumes 27-32. 


Book 
Year Vol Articles Reviews 
1949 33 76 33 
1948 32 78 34 
1947 31 80 29 
1946 30 67 30 
1945 29 53 18 
1944 28 34 18 
1943 27 58 9 


The number of pages printed from 1946 
through 1949 is as follows: 


1946 1947 1948 1949 


No. pp. (“reg.”’) 476 478 485 479 
No. pp. (“‘early”’) 192 186 199 138 


Total 668 664 O84 617 


Vol. 33 was one page shy of the budgeted 
limit of 480 pages. Early publication contrib- 
uted a total of 18 extra articles and 138 extra 
pages for our APA reader-owners and outside 
subscribers. 

Early publication in 1949=18 articles; in 
1948 = 21 articles; in 1947 = 28 articles; in 1946 
=16articles. The lag in publication of “early 
publication” articles ranges from 2 to 4 months 
with a median of 3 months. 

In addition to printing “date of receipt” at 
the end of each article, “early publication” is 
also added if such be the case. Thus all infor- 
mation concerning editorial action on pub- 
lished articles is available to contributor- 
owners, to reader-owners, and to outside 
subscribers. 

The disposition of manuscripts received dur- 
ing 1949 and in each of the three preceding 
““post-war’’ years is as follows: 


1946 1947 1948 1949 
Accepted 67 59 89 102 
Rejected 38 44 35 74 
Total Received 103 176 
Per Cent Rejected : 43 42 


The number of Mss. received shows a steady 
post-war increase throwing a constantly heavier 
burden on the editor and his Consulting 
Editors. 

A larger number of manuscripts would have 
been rejected had it not been for our policy of 
giving authors a chance to revise manuscripts 
in accordance with detailed suggestions pro- 
vided chiefly by our Consulting Editors. Of 
102 Mss. accepted in 1949, 61 were accepted 
“as is” and 41 were accepted following revision 
by the author. 

The lag in publication of articles published 
in regular turn varies from time to time because 
manuscripts are not received in an even flow. 
During 1949 the low point was 5 in December 
and the high points were 23 in April and 23 in 
October. The median lag for regularly sched- 
uled articles in Feb., Apr., and June 1947, for 
the same months in 1948, for the same months 
in 1949, for Aug., Oct., Dec., 1949, and for 
the same months in 1950 is as follows: 


Median Months of Lag 
Feb. Apr. Feb. Apr. Feb. Apr. Aug. Oct. Feb. Apr 
June 1947 June 1948 June 1949 Dec. 1949 June 1950 
12 9 7 9 10 


Lag in publication is thus shown to be in 
creasing since the first half of 1949. As of 
April 15, 1950 there are 39 accepted Mss. on 
hand plus 14 Mss. accepted if revised, and 7 
Mss. ‘action pending,” or a total of 60 Mss. 
likely to be published in August 1950 or later. 
The estimated lag for the last few Mss. accepted 
during the first half of April 1950 is 12 months. 

The problem of lag in publication is thus 
once again becoming acute in spite of the con- 
tinued policy of: 1. “Brevity consistent with 
clarity.” 2. Early publication at author’s ex- 
pense. 3. Use of American Documentation 
Institute for large, unwieldy, and costly tables 
and figures. 

The APA Council of Representatives Direc- 
tive adopted at the Denver meeting in Septem- 
ber 1949 will aid the editor in persuading 
authors to abbreviate their reports and to 
make increased use of ADI. During 1949, 
seven articles used ADI. During the first half 


285 








286 


of 1950, six articles used ADI. Vigorous 
action by the editor is the only way lag in pub- 
lication can again be reduced to a desirable six 
or seven month lag. If the postwar pressure 
continues to increase and we have exhausted 
all possibilities of “brevity consistent with 
clarity” and use of ADI then the Board of 
Editors and the Committee on Publications 
will be forced to consider an increase in the 
budgeted number of pages per volume. 

The problem of the long delay between sub- 
mission of edited copy to the printer and the 
receipt of a given issue by subscribers con- 
tinues to be acute. An increased number of 


authors write in concerning their reprint 
orders because of the long time elapsing be- 
tween appearance of an article in a given issue 
and receipt of reprints. 


Denald G. Paterson 


The new cover page and the new double 
column format has been received with favor 
judging from unsolicited comments. The 480 
pages per volume of the old format with about 
500 words per page has been reduced to 384 
pages per volume in the new format but with 
about 750 words per page. Thus, there will 
be a slight increase in the number of articles 
that can be published per volume beginning 
with 1950. 

The editor again wishes to express his ap- 
preciation to his Consulting Editors for in- 
valuable aid in evaluating manuscripts and in 
submitting detailed suggestions for revision 
when such is indicated. 

Donald G. Paterson 
Editor 
Received A pril 22, 1950. 





Book Reviews 


- 


Drake, Frances S., and Drake, Charles A. A 
human relations casebook for executives and 
supervisors. New York: McGraw-Hill Book 
Company, Inc., 1947. Pp. xiv+187. $2.50. 


In reviewing a recent textbook on personnel 
management I was impressed by the thorough- 
ness of its presentation of facts, principles, and 
techniques but felt that it was very dry reading. 
The structure was there, all neatly organized 
and patterned, but it lacked the living tissue 
necessary to make it come alive. 

This “human relations casebook” tries to 
add flesh and blood to the formal structure of 
personnel management and does it pretty well. 
It presents 75 case histories, drawn from actual 
experience and descriptive of a wide variety of 
situations in industry. Section I, Adjusting 
the Human Resources, includes 17 cases in- 
volving selection, transfer, emotional devia- 
tions, training, retirement, etc. Section II, 


Developing Attitudes and Sentiments, presents 
11 cases illustrative of factors affecting em- 
ployee morale. Section III, Using and Abusing 
Incentives, portrays 11 incidents dealing with 


wage incentives and non-financial rewards. 
Section IV, Bargaining with Individuals and 
Groups, presents 8 cases involving handling of 
grievances and labor demands. Section V, 
Mobilizing the Brain Power of an Organization, 
deals with the use of conferences, suggestion 
systems, and work simplification programs 
through discussion of 6 cases. The final Sec- 
tion VI, the Ways of Executives and Super- 
visors, portrays 22 specific incidents illustrative 
of good and poor personal traits and behavior 
patterns of managers. 

Although the case histories are quite brief 
and intentionally “stripped of much back- 
ground and emotionally toned descriptive 
matter,”’ they focus attention on the prin- 
ciples involved. To aid the reader in profiting 
vicariously from the successes and errors of 
others, each case is presented in the following 
manner: first, the area illustrated by the case 
is briefly outlined; second, the case history is 
presented; third, interpretive comments are 
made, generalizing from the specific incident; 
fourth, questions for group discussion are 


raised; fifth, the reader is asked to formulate 
in his own words the primary lesson or principle 
drawn from the case under discussion. Facili- 
tating the use of the book as a text for training 
group or classroom discussion are well selected 
and annotated bibliographies at the end of 
each of the six major sections. 

Learning (or teaching) through case ex- 
amples has obvious limitations. Solutions 
successful in one situation frequently fail in 
another and it is a common human error to 
generalize too quickly from a few instances. 
The authors seem to be aware of this in their 
deliberate cutting out of many of the details 
of the cases and in their emphasis upon the in- 
terpretation of the situation in terms of general 
principles. The principles do not grow out of 
the cases so much as the cases are illustrative of 
the principles. Looked at in this light, this 
reviewer felt it inappropriate to make what 
seemed on first reading to be obvious criticisms 
on the briefness of the cases and the failure to 
include certain expected types of situations. 

The real merit of this book lies in the 
soundness of the particular principles illus- 
trated, the value of the case examples in stimu- 
lating thought by the reader, and its general 
usefulness as a supplementary textbook or 
as a workbook in a supervisory development 
program. 

Albert S. Thompson 

Teachers College, 

Columbia University 


Mossin, A. C. Selling performance and con- 
tentment in relation to school background. 
New York: Bureau of Publications, Teachers 
College, Columbia University, 1949. Pp. 
viii+ 166. $2.75. 

Do salesgirls who have completed high 
school courses in ‘distributive’ subjects per- 
form more efficiently, and are they more con- 
tented with their work than salesgirls who 
have not had such courses? These are the two 
major questions that this study posed for in- 
vestigation. 

The subjects were 94 salesgirls from a large 
department store in New York City, rather 


287 


Bay! Hab aaa BP os eh er, 








288 


homogeneous with regard to such factors as 
period of employment, marital status, and 
duration of high school attendance prior to 
employment. They differed in having followed 
in high school what the author classified as 
either a college preparatory, commercial, dis- 
tributive occupational, or clothing arts cur- 
riculum. 

Job performance was measured by means of 
subjective criteria derived from independent 
ratings by four trained shoppers who observed 
each girl at work. Job contentment was 
evaluated by responses to three instruments 
specifically constructed for this investigation, 
namely a Job Functions Interest Blank, a Job 
Conditions Satisfaction Questionnaire and a 
Job Ranking Test. 

No significant differences were found be- 
tween the performances of salesgirls with dif- 
ferent high school curricular backgrounds. 
Some slight tendencies were evidenced for 
girls who had taken the “distributive” cur- 
riculum to score higher on measures of job con- 
tentment than did girls from the other cur- 
ricular groups. The “distributive” group ex- 
pressed a greater desire to remain in sales work 
than did the three other curricular groups. 

The study thus lends some support to other 
investigations which have indicated that in- 
terests, although not always useful as pre- 
dictors of job performance, are indicative of 
motivations important to job contentment, 
worker morale, and reduction of employee 
turnover. However, certain weaknesses in ex- 
perimental design forced upon the investigator 
by the particular department store situation 
tend to vitiate positive conclusions from the 
data. 

The salesgirls were drawn from 41 different 
departments of the store, thus making com- 
parisons between their selling tasks rather 
tenuous in view of differences in materials sold 
and problems encountered in selling. Only 13 
of the 94 subjects were classified in the ‘“dis- 
tributive” curricular group and six of these had 
taken only two “distributive” courses. The 
author had to rely upon subjective criteria for 
the evaluation of job performance. The relia- 
bilities of these (ratings by shoppers) were 
quite low and correlated .30 or lower with 
supervisors’ ratings of the same emplovees. 

Perhaps an inherent limitation in this ap- 


Book Reviews 


proach to comparing workers’ high school back- 
grounds is that these do not in themselves 
reveal attitudes toward curricula and work 
that are indicative of the motivations underly- 
ing vocational choice. We must know more 


about the meaning of vocational choice to indi- 
vidual students and workers if we are to evalu- 
ate properly the contributions of their back- 
grounds to job performance and contentment. 


Daniel Raylesberg 


B'nai B'rith Youth Organization 


Vernon, Phillip E., and Parry, John B. Per- 
sonnel selection in the British Forces. Lon- 
don: University of London Press Ltd. 1949, 
324 pp. 20/net. 


This book presents a summarized account of 
the application of psychological methods to 
personnel selection in the British Navy, Army, 
Air Force, and Army Territorial Service during 
World War II. The book is addressed to both 
industrialists and educators with the hope that 
the methods employed during the war in the 
British Armed Forces may be found useful in 
the peace-time selection of employees and in 
student selection and guidance. Throughbut 
the book the authors have attempted to present 
their material as an integral part of the whole 
field of personnel psychology. War-time ex- 
perience in the application of psychological 
methods to selection problems is, therefore, in- 
tegrated with the knowledge which existed in 
the field prior to the war. Only material per- 
taining to personnel selection is included; 
applications of psychology to training, design 
of equipment, morale and so forth are not 
treated. 

Part I is concerned with the organization of 
selection programs, the general procedures em- 
ployed and the work of psychologists in the 
Royal Navy, Army, Royal Air Force, and 
Army Territorial Service. The work of non- 
military psychologists is not described. A 
chapter on the rise of vocational psychology is 
also included. This part is designed to give 
the reader a background for understanding 
the organizations within which psychologists 
worked. Unless this part is read first it will 
not be easy to read the later chapters, particu- 
larly since many abbreviations are employed. 





Book Reviews 


Appendix I, Abbreviations, can be used for 
reference purposes. 

In part IT the authors present the highlights 
regarding the applications of psychology to 
personnel selection in the British Armed 
Forces. The first two chapters of this section 
are excellent condensed statements of the basic 
principles of vocational classification and the 
value of personnel selection procedures as a 
whole in the Armed forces. Other chapters are 
concerned with the biographical questionnaire, 
the interview, principles of psychological test- 
ing, and the various types of tests—intelli- 
gence and educational, non-verbal, mechanical, 
special aptitude, and temperament tests. A 
separate chapter is devoted to selection findings 
in the Royal Air Force. A chapter on conclu- 
sions completes Part II. 

‘In the judgment of this reviewer the authors 
have made three significant contributions in 
the writing of this book. First they have pre- 
sented a brief, clear description of selection 
procedures as they existed in the British Armed 
Forces during World War II. The student of 
military and personnel psychology will wel- 
come having this material in a readily available 
form. Second, the book provides a good sum- 


mary of the applications of psychology to 


personnel selection before the war. The 
chapters on the biographical questionnaire, the 
the interview and principles of psychological 
testing are particularly good in this respect. 
Third, the results obtained during the war are 
critically evaluated and integrated with the 
knowledge which existed prior to the war. In 
a sense, the book almost impresses one as a 
textbook in personnel psychology rather than 
being a description of war experiences with the 
applications of psychology to personnel selec- 
tion. The book is by no means limited to a 
description of findings. The contributions of 
war experiences to the total field of personnel 
psychology and the peace-time applications of 
these experiences are continually kept in mind 
by the authors 

As one reads this book he is impressed with 
the similarity of findings in the British and 
American Armed Forces. The types of tests 
which “worked” were much the same, the in- 
terview was beset with identical difficulties and 
the criterion was a problem for the British as 


well as American psychologists. Thorough 


289 


follow-up studies were equally few in number 
in the armed forces of the two countries. 

The serious student of personnel psychology 
may be disappointed to find that data are 
presented only in summarized form. It is 
stated by the authors that they hope more 
detailed reports can be published elsewhere. 
The book is well written for its intended audi- 
ence and will be widely read by personnel 
psychologists, personnel administrators and 
guidance workers. It will be useful as a refer- 
ence work or supplementary text in courses 
concerned with personnel selection. 


Dewey B. Stuit 


State University of Lowa 


Warner, L. W., Gardner, B. B., Henry, W. E., 
and Haggard, E. A. Jdentifying and devel- 
oping potential leaders. New York: Ameri- 
can Management Association Personnel 
Series No. 127. 1949. Pp. 26. $0.75. 


This issue contains a series of four papers 
presented at the AMA Mid-winter Personnel 
Conference with the sub-title ‘Social Science 
and the Management of American Business— 
A Report to Management from the University 
of Chicago.” 

The participants, and the titles, were: W. 
Lloyd Warner: “individual Opportunity—A 
Challenge to the Free Enterprise System’’; 
Burleigh Gardner: ‘Conserving and Develop- 
ing our Human Resources’; W. E. Henry: 
“Identifying the Potentially Successful Execu- 
tive’; and E. A. Haggard: “Social and Psycho- 
logical Factors in Work Adjustment.”’ 

While it well may be that American industry 
is satiated or even confused by an inundation of 
“challenges,” these associates of the University 
of Chicago’s Committee on Human Develop- 
ment unhesitatingly throw out several more. 
Warner describes the uncomfortable situation 
that the usual social mobility routes upward 
(occupation and education) to higher status 
positions are becoming inadequate and indeed 
are closed in many areas. The resultant is a 
sort of a mass frustration, a willingness to give 
up the old system of individual mobility, and a 
readiness to blame the system. 

Gardner discusses the overall problem of loss 
of satisfaction and deteriorating morale as 
brought about through such organizational 








290 


problems as technological change, “‘bigness”’ 
and over-extended hierarchy. decline of small, 
successful, satisfying, owner-managed firms, 
and the effects of specialization. Paradoxical 
to the apparent purpose of the papers (as 
revealed in the publication title) is “See that 
the good men in your organization are found 
and give them a chance to use what they have” 
(p. 12) (1). 

Henry brings out the problem of the indi- 
viduals who run the organization—the leaders. 
He ascribes the qualities of executive leader- 
ship as being to a great extent the qualities of 
individual personality. Personality research, 
according to Henry, among successful men in 
business and industry has revealed a pattern 
of common personality characteristics and that 
this is a pattern of fairly long standing. Ex- 
ecutive skills may be equal but the funda- 
mental personality organizations may be such 
that one executive is successful and another 
isn’t. He regards the executive as a “‘par- 
ticular kind of personality” worthy of per- 
sonality analysis as used elsewhere. For illus- 
tration he indicates that the successful execu- 
tive thinks in terms of the job hierarchy con- 
tinuously, and has a positive reverence for a 
competent parent image. But in general he 
casts research questions rather than gives leads 
to identify potentially successful executives. 

Haggard relates the above points to the 
necessity of satisfying the run-of-the-mill 
workers. He is particularly critical of ‘‘mecha- 
nized selection’ by means of test scores where 
the factors of interests, motivations, and 
personality structure are not estimated at the 
hiring. (He speaks as if most workers got 
employed because of test scores whereas only 
about 15% of U. S. companies use any test at 
all.) He appeals for greater management rec- 
ognition of human relationships on the job, of 
emotional needs, of the complete individual 
personalities. Again, ‘“‘We need to locate the 


men who can rise in the organization, and help. 


them do so” (p. 21). 

All this is important and interesting. But 
it doesn’t add up to “identifying and develop- 
ing potential leaders.” The audience un- 
doubtedly was stimulated to support research, 


Book Reviews 


% 


activities, and efforts toward this end—which 
is all to the good. 


Ralph R. Canter, Jr. 
University of California at Berkeley 


Robert Hoppock. Group guidance; Principles, 
techniques, and evaluation. New York: Mc- 
Graw-Hill Book Co., 1949. Pp. 393. $3.75. 


Group Guidance is not a book for psychol- 
ogists. As Dr. Hoppock states in his preface, 
“this book has been written for the beginner 
(teacher) who has been assigned some re- 
sponsibility for group guidance and who wants 
to know what it is all about.” To those who 
object to the term group guidance, as does the 
reviewer, Hoppock states, “The author of 
this book has no particular desire to debate 
the issue. . . . The author prefers the term 
‘group guidance’ because it is short, concise, 
and descriptive and because it is extensively 
used in guidance literature. . . .” 

The book should be a useful one to teachers 
and administrators in secondary education. 
It is pitched well as to level, it is readable, it 
contains many answers to “how” questions, 
and it makes none of the naive claims which 
have marred so many similar publications. 
Hoppock begins with brief definitions of his 
terms but, wisely, does not spend pages re- 
hashing the history of the guidance movement. 
He relates the instructional, group approach 
to the individual counseling process objectively 
and well. He states the functions of the group 
approach in the first chapter and adheres to 
them for the rest of the way. 

Part II, Techniques, is straightforward and 
clear. He considers the techniques of follow- 
up, visits to jobs and institutions, group con- 
ferences, student survey of jobs, case confer- 
ences, laboratory investigations, self measure- 
ment, and a number of less important methods. 
The reviewer takes general issue with the 
author only in the realm of self measurement. 
In justice to Dr. Hoppock it must be stated 
that his approach is a cautious one, although 
sad personal experiences leave some reviewer- 
doubts as to how cautious all readers of Group 
Guidance will be with test scores. 





Book Reviews 


Part III, Evaluation, is excellent. Dr. Hop- 
pock brings in pertinent research and applies 
it to his thesis in language which should hit the 
bulls eye for his intended audience. This sec- 
tion should be required reading for those stu- 
dents of professional education who believe 
that anything is possible by talking a/ groups 
of people. It will also be helpful to whose 


students of clinical psychology who believe 
that all worthwhile results must come from 
dealing with the individual. 


291 


Part IV, Appendixes, is a sound contribution 
to teaching methodology. Dr. Hoppock has 
been very successful in using the methods 
which he describes in these appendices and 
reading them convinces the reviewer that most 
of us can learn from what he has presented. 

A sound, well written book, objective in pres- 
entation, and cautious in approach which 
should be received well in educational circles. 


Milton E. Hahn 


University of California, Les Angeles 








New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paterson, Editor, 
Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota 


An index of nomograms. Douglas Payne Adams, Editor. 
New York: John Wiley and Sons, Inc., 1950. Pp. 
174. $4.00. 

With brushes of comet's hair. Cornelia H. Bogert. New 
York: Exposition Press, 1950. Pp. 165. $5.00. 

A history of experimental psychology. Second edition. 
Edwin G. Boring. New York: Appleton-Century- 
Crofts, Inc., 1950. Pp. 777. $6.00. 

Readings on modern methods of counseling. 
Brayfield. New York: Appleton-Century-Crofts, 
Inc., 1950. Pp. 566. $5.00. 

College psychology. Warner Brown and Howard C. 
Gilhousen. New York: Prentice-Hall, Inc., 1950. 
Pp. 485. Cloth, $3.75; Paper, $2.85. 

Marriage analysis. WHarold T. Christensen. New 
York: The Ronald Press Co., 1950. Pp. 510. $4.50. 

Experimental designs. William G. Cochran and Ger- 
trude M. Cox. New York: John Wiley and Sons, 
Inc., 1950. Pp. 454. $5.75. 

Abnormal psychology and modern life. James C. Cole- 
man. Chicago: Scott, Foresman and Co., 1950. Pp. 
600. $4.50. 

Readability. Edgar Dale, Editor. Chicago: National 
Council of Teachers of English, 1949. Pp. 44. 
$.60 per copy. $.50 each for 10 or more. 

Child guidance approach to juvenile delinquency. Eugene 
Davidoff and Elinor S. Noetzel. New York: Child 
Care Publications, 1950. $4.50. 

Proceedings of second annual meeting of Industrial Rela- 
tions Research Association. Milton Derber, Editor. 
New York: Industrial Relations Research Associa- 
tion, 1949. Pp. 299. Annual subscription to IRRA 
Publications $5.00. 

Rating em ployee and supervisory performance. 
Dooher and Vivienne Marquis, Editors. 
American Management Association, 1950. 
$3.75. 

The organization of mental abilities. Jerome Edward 
Doppelt. New York: Bureau of Publications, Teach- 
ers College, Columbia University, 1950. Pp. 86. 
$2.10. 

Guidance services in smaller schools. 
lich. 
Pp. 352. $3.75. 

Guidance testing. 
Benson 
Pp. 112. $1.00 

Psychology. Henry E. Garrett. New York: American 
Book Co., 1940. Pp. 323. $3.00. 

Fundamental statistics in psychology and education. 
Second edition. J. P. Guilford. New York: Mc 
Graw-Hill Book Co., Inc., 1950. Pp. 633. $5.00. 

Educational psychology. Edwin R. Guthrie and Francis 
F. Powers. New York: The Ronald Press Co., 1950. 
Pp. 524. $4.00 


Counsding adolescents 


Arthur H. 


M. Joseph 
New York: 
Pp. 192. 


Clifford P. Froeh- 
New York: McGraw-Hill Book Co., Inc., 1950. 


Clifford P. Froehlich and Arthur L. 
Chicago: Science Research Associates, 1949 


S. A. Hamrin and Blanche B. 


292 


Paulson. Chicago: Science Research Associates, 
1949. Pp. 380. $3.50. 

The handbook of child guidance. Ernest Harms, Editor. 
New York: Child Care Publications, 1950. Pp. 751. 
$6.00. 

Industrial psychology. Thomas Willard Harrell. New 
York: Rinehart and Co., Inc., 1949. Pp. 462. 
$4.25. 

Personality, development and assessment. 
Harsh and H. G, Schrickel. New York: The Ronald 
Press Co., 1950. Pp. 518. $5.00. 

The organization of behavior. D.O.Hebb. New York: 
John Wiley and Sons, Inc., 1949. Pp. 335. $4.00. 
Situational factors in leadership. John K. Hemphill. 
Columbus: Bureau of Educational Research, Ohio 
State University, 1949. Pp. 136. $3.00, cloth; 

$2.50, paper. 

A miniature textbook on feeblemindedness. Leo Kanner. 
New York: Child Care Publications, 1950. Pp. 33. 
$1.25. 

Gestalt psychology, its nature and significance. David 
Katz. New York: The Ronald Press Co., 1950. 
$3.00. 

Mental tests in clinics for children. Grace H. Kent. 
New York: D. Van Nostrand Co., Inc., 1950. Pp. 
180. $2.45. 

The biology of human starvation. Ancel Keys, et al. 
Minneapolis: University of Minnesota Press, 1950. 
Pp. 1360. $24.00. 

The yearbook of psychoanalysis. Sandor Lorand, Man- 
aging Editor. New York: International Universities 
Press, Inc., 1950. Pp. 317. $7.50. 

The first two decades of life. Frieda K. Merry and 
Ralph V. Merry. New York: Harper and Brothers, 
1950. Pp. 581. $3.75. 

Juvenile delinquency, modern society. Martin H. Neu- 
meyer. New York: D. Van Nostrand Co., Inc., 
1950. Pp. 335. $3.75. 

Social psychology. Theodore M. Newcomb. 
York: Dryden Press, 1950. Pp. 800. $4.50. 

Child development. Willard C. Olson. Boston: D. C. 
Heath and Co., 1949. Pp. 430. $4.00. 

An introduction to therapeutic counseling. FE. H. Porter. 
Boston: Houghton Mifflin Co., 1950. Pp. 223. 
$2.75. 

Human ability. C. Spearman and Ll. Wynn Jones. 
New York: The Macmillan Co., 1950. Pp. 198. 
$2.50. 

Criminology. Revised edition. Donald R. Taft. New 
York: The Macmillan Co., 1950. Pp. 704. $5.50. 
Management behavior and foreman attitude. David N. 

Ulrich, Donald R. Booz, and Paul R. Lawrence. 
Boston: Graduate School of Business Administration, 
Harvard University, 1950. Pp. 56. $.75. 

Report of the Proceedings of the Second Iniernational 


Charles M. 


New 





New Books, Monographs, and Pamphlets 


Congress for the Education of Maladjusted Children. 
I. C. van Houte and Berthold Stokvis, Editors. 
Amsterdam, Holland: Systemen Keesing, Ruysdae!- 
straat 71,1950. Pp. 448. $4.50. 

Social class in America. W. Lioyd Warner, Marchia 
Meeker, and Kenneth W. Eells. Chicago: Science 
Research Associates, 1949. Pp. 274. $4.25. 

Blindness. Paul A. Zahl. Princeton: Princeton Uni- 
versity Press, 1950. Pp. 576. $7.50. 


Human behavior and the principle of least effort. George 


293 


Kingsley Zipf. Cambridge: Addison-Wesley Press, 
Inc., 1949. Pp. 573. 

Directory of guidance agencies. Ethical Practices Com- 
mittee of the National Vocational Guidance Associa- 
tion. St. Louis: Dr. Nathan Kohn, Washington 
University, 1950. $1.00. 

Revere Safety Test. Revere Copper and Brass, Inc. 
Chicago: Science Research Associates, 1949. Test, 
20 pages. Handbook, 60 pages. Test, $.30; Hand- 
book, $.60. 














Subscription Lists of the 
American Psychological Association 


MEMBERS AND AFFILIATES 
Approximately 9,800 names 


The American Psychological Association main- 
tains an address list of its members and affiliates, 
which is for sale providing the nature of its use is in 
conformity with the purposes of the Association. 


1950 Prices 
Envelopes addressed 


(advertiser furnishes envelopes and pays express cherges) 
Addresses on tape, not gummed 
(suitable for a mailing machine) 


STATE LISTS 
Priced according to number of names wanted 


SUPPLEMENTARY LISTS 


Approximately 3,500 names in total list 
Individual journal lists vary from 400 to 1,600 


The Association also maintains a list of subscribers 
who are not members of the Association (universi- 
ties, libraries, industrial laboratories, hospitals, other 
types of institutions, and individual subscribers). 
The general list for all journals includes all types. 
Each single journal has a more specialized circulation. 


For any one journal, envelopes addressed .... $15.00 
For any one journal, addresses on tape 

For all journals, envelopes addressed 

For all journals, addresses on tape 


For further information, write to 


American Psychological Association 
1515 Massachusetts Avenue Northwest 
Washington 5, D. C. 

















Recently Published 
PsycHOLOGICAL MonoGRaPHs: GENERAL AND APPLIED 
Volume 63, 1949 


FACIAL EXPRESSIONS OF EMOTION. James C. Coleman, Uni- 
versity of Southern California. #296, "$1.00 


A COMPARATIVE STUDY OF THE WHERRY-DOOLITTLE AND 
A MULTIPLE CUTTING-SCORE METHOD. Glen Grimsley, 
General Motors Institute. #297, $.75 


FACTOR ANALYSES OF TESTS AND CRITERIA: A COMPARATIVE 
STUDY OF TWO AAF PILOT POPULATIONS. William B. 
Michael, Princeton University. #298, $1.00 


THE APPRAISAL OF PARENT BEHAVIOR. Alfred L. Baldwin, 


Joan Kalhorn, Sas Pee Huffman Breese, Fels Research Insti- 
tute. #299, $1.50 


STUDIES OF IDENTICAL TWINS REARED APART. The late 
Barbara S. Burks; and Anne Roe, New York City. #300, $1.00 


COLOR PREFERENCES OF PSYCHIATRIC GROUPS. Samuel J. 
Warner, New York City. #301, $.75 


PERCEPTION OF BODY POSITION AND OF THE POSITION OF THE 
VISUAL FIELD. H. A. Witkin, Brooklyn College. #302, $1.00 


AN EXPERIMENTAL EXAMINATION OF THE THEMATIC APPER- 
CEPTION TECHNIQUE IN CLINICAL DIAGNOSIS. A. A. 
Hartman, Boston University. #303, $1.00 


RELIGION AND HUMANITARIANISM: A STUDY OF INSTITU- 
TIONAL IMPLICATIONS. Clifford Kirkpatrick, Indiana Uni- 
versity. #304, $.75 


THE DEVELOPMENT AND VALIDATION OF A SET OF MUSICAL 
rey a Robert W. Lundin, Hamilton College. 


A COMPARISON OF TWO TESTS OF INTELLIGENCE ADMINIS- 
tease ty A ADULTS. Anna §. Elonen, University of Chicago. 


The 1949 volume of the Psychological Monographs consists of eleven 
separate issues. Orders for any of these can be placed separately at 
the prices listed above, or the entire volume can be ordered for $6.00. 





American Psychological Association 


1515 Massachusetts Avenue N. W., Washington 5, D. C. 

















Reprints Now Available 


APA COMMITTEE ON ETHICAL STANDARDS 
FOR PSYCHOLOGY. Ethical Standards 
for the Distribution of Psychological 
Tests and Diagnostic Aids. Reprinted 
from the American Psychologist, No- 
vember, 1949. 8 pages, $.10. 


DUNCKER, KARL. On Problem Solving. 
Recently reprinted. Psychological 
Monograph #270, 1945. 128 pages, $2.50. 


EVANS, JEAN. Miller. A case history re- 
printed from the Journal of Abnormal 
and Social Psychology, April, 1950. 22 
pages, one for $.25, 50 for $10.00. 


WOLFLE, HELEN M. Personnel Placement 
Activities of the American Psychologi- 
cal Association. Reprinted from the 
American Psychologist, June, 1950. 8 
pages, $.10. 


Order from 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
1515 Massachusetts Avenue N. W. 
Washington 5, D. C. 














